Misha Dmitriev created HADOOP-14523:
---------------------------------------

             Summary: OpensslAesCtrCryptoCodec.finalize() holds excessive 
amounts of memory
                 Key: HADOOP-14523
                 URL: https://issues.apache.org/jira/browse/HADOOP-14523
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Misha Dmitriev


I recently analyzed JVM heap dumps from Hive running a big workload. Two 
excerpts from the analysis done with jxray (www.jxray.com) are given below. It 
turns out that nearly a half of live memory is taken by objects awaiting 
finalization, and the biggest offender among them is class 
OpensslAesCtrCryptoCodec:

{code}
  401,189K (39.7%) (1 of sun.misc.Cleaner)
     <-- Java Static: sun.misc.Cleaner.first
  400,572K (39.6%) (14001 of org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager, java.util.jar.JarFile etc.)
     <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- 
sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next} <-- Java Static: 
sun.misc.Cleaner.first
  270,673K (26.8%) (2138 of org.apache.hadoop.mapred.JobConf)
     <-- org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.conf <-- 
j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next 
<-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first

---------------------

  102,232K (10.1%) (1 of j.l.r.Finalizer)
     <-- Java Static: java.lang.ref.Finalizer.unfinalized
  101,676K (10.1%) (8613 of org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
java.util.zip.ZipFile$ZipFileInflaterInputStream, 
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager etc.)
     <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- Java Static: 
java.lang.ref.Finalizer.unfinalized
{code}

This heap dump was taken using 'jmap -dump:live', which forces the JVM to run 
full GC before dumping the heap. So we are already looking at the heap right 
after GC, and yet all these unfinalized objects are there. I think this happens 
because the JVM always runs only one finalization thread, and thus the queue of 
objects that need finalization may get processed too slowly. My understanding 
is that finalization works as follows:

1. When GC runs, it discovers that object x that overrides finalize() is 
unreachable.
2. x is added to the finalization queue. So technically x is still reachable, 
it occupies memory, and _all the objects that it references stay in memory as 
well_.
3. The finalization thread processes objects from the finalization queue 
serially, thus x may stay in memory for long time.
4. x.finalize() is invoked, then x is made unreachable. If x stayed in memory 
for long time, it's now in Old Gen of the heap, so only full GC can clean it up.
5. When full GC finally occurs, x gets cleaned up.

So finalization is formally reliable, but in practice it's quite possible that 
a lot of unreachable, but unfinalized objects flood the memory. I guess we are 
seeing all these OpensslAesCtrCryptoCodec objects when they are in phase 3 
above. And the really bad thing is that these objects in turn keep in memory a 
whole lot of other stuff, in particular JobConf objects. Such a JobConf has 
nothing to do with finalization, yet the GC cannot release it until the 
corresponding OpensslAesCtrCryptoCodec's is gone.

Here is OpensslAesCtrCryptoCodec.finalize() method with my comments:

{code}
protected void finalize() throws Throwable {
  try {
    Closeable r = (Closeable) this.random;
    r.close();  // Relevant only when (random instanceof OsSecureRandom == true)
  } catch (ClassCastException e) {
  }
  super.finalize();  // Not needed, no finalize() in superclasses
}
{code}

So, finalize() in this class, that may keep in memory a whole tree of objects, 
is relevant only when this codec is configured to use OsSecureRandom class. The 
latter reads random bytes from the configured file, and needs finalization to 
close the input stream associated with that file.

The suggested fix is to remove finalize() from OpensslAesCtrCryptoCodec and add 
it to the only class from this "family" that really needs it, OsSecureRandom. 
That will ensure that only OsSecureRandom objects (if/when they are used) stay 
in memory awaiting finalization, and no other, irrelevant objects.

Note that this solution means that streams are still closed lazily. This, in 
principle, may cause its own problems. So the most reliable fix would be to 
call OsSecureRandom.close() explicitly when it's not needed anymore. But the 
above fix is a necessary first step anyway, it will remove the most acute 
problem with memory and will not make any other things worse than they 
currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to