Misha Dmitriev created HADOOP-14523:
---------------------------------------
Summary: OpensslAesCtrCryptoCodec.finalize() holds excessive
amounts of memory
Key: HADOOP-14523
URL: https://issues.apache.org/jira/browse/HADOOP-14523
Project: Hadoop Common
Issue Type: Improvement
Reporter: Misha Dmitriev
I recently analyzed JVM heap dumps from Hive running a big workload. Two
excerpts from the analysis done with jxray (www.jxray.com) are given below. It
turns out that nearly a half of live memory is taken by objects awaiting
finalization, and the biggest offender among them is class
OpensslAesCtrCryptoCodec:
{code}
401,189K (39.7%) (1 of sun.misc.Cleaner)
<-- Java Static: sun.misc.Cleaner.first
400,572K (39.6%) (14001 of org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec,
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager, java.util.jar.JarFile etc.)
<-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <--
sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next} <-- Java Static:
sun.misc.Cleaner.first
270,673K (26.8%) (2138 of org.apache.hadoop.mapred.JobConf)
<-- org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.conf <--
j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next
<-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first
---------------------
102,232K (10.1%) (1 of j.l.r.Finalizer)
<-- Java Static: java.lang.ref.Finalizer.unfinalized
101,676K (10.1%) (8613 of org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec,
java.util.zip.ZipFile$ZipFileInflaterInputStream,
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager etc.)
<-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- Java Static:
java.lang.ref.Finalizer.unfinalized
{code}
This heap dump was taken using 'jmap -dump:live', which forces the JVM to run
full GC before dumping the heap. So we are already looking at the heap right
after GC, and yet all these unfinalized objects are there. I think this happens
because the JVM always runs only one finalization thread, and thus the queue of
objects that need finalization may get processed too slowly. My understanding
is that finalization works as follows:
1. When GC runs, it discovers that object x that overrides finalize() is
unreachable.
2. x is added to the finalization queue. So technically x is still reachable,
it occupies memory, and _all the objects that it references stay in memory as
well_.
3. The finalization thread processes objects from the finalization queue
serially, thus x may stay in memory for long time.
4. x.finalize() is invoked, then x is made unreachable. If x stayed in memory
for long time, it's now in Old Gen of the heap, so only full GC can clean it up.
5. When full GC finally occurs, x gets cleaned up.
So finalization is formally reliable, but in practice it's quite possible that
a lot of unreachable, but unfinalized objects flood the memory. I guess we are
seeing all these OpensslAesCtrCryptoCodec objects when they are in phase 3
above. And the really bad thing is that these objects in turn keep in memory a
whole lot of other stuff, in particular JobConf objects. Such a JobConf has
nothing to do with finalization, yet the GC cannot release it until the
corresponding OpensslAesCtrCryptoCodec's is gone.
Here is OpensslAesCtrCryptoCodec.finalize() method with my comments:
{code}
protected void finalize() throws Throwable {
try {
Closeable r = (Closeable) this.random;
r.close(); // Relevant only when (random instanceof OsSecureRandom == true)
} catch (ClassCastException e) {
}
super.finalize(); // Not needed, no finalize() in superclasses
}
{code}
So, finalize() in this class, that may keep in memory a whole tree of objects,
is relevant only when this codec is configured to use OsSecureRandom class. The
latter reads random bytes from the configured file, and needs finalization to
close the input stream associated with that file.
The suggested fix is to remove finalize() from OpensslAesCtrCryptoCodec and add
it to the only class from this "family" that really needs it, OsSecureRandom.
That will ensure that only OsSecureRandom objects (if/when they are used) stay
in memory awaiting finalization, and no other, irrelevant objects.
Note that this solution means that streams are still closed lazily. This, in
principle, may cause its own problems. So the most reliable fix would be to
call OsSecureRandom.close() explicitly when it's not needed anymore. But the
above fix is a necessary first step anyway, it will remove the most acute
problem with memory and will not make any other things worse than they
currently are.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]