[ 
https://issues.apache.org/jira/browse/HADOOP-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061340#comment-16061340
 ] 

John Zhuge commented on HADOOP-14523:
-------------------------------------

Great work [~mi...@cloudera.com]. Have you got a chance to rerun JXRay after 
the fix?

> OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-14523
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14523
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>             Fix For: 2.9.0, 3.0.0-alpha4
>
>         Attachments: HADOOP-14523.01.patch, HADOOP-14523.02.patch
>
>
> I recently analyzed JVM heap dumps from Hive running a big workload. Two 
> excerpts from the analysis done with jxray (www.jxray.com) are given below. 
> It turns out that nearly a half of live memory is taken by objects awaiting 
> finalization, and the biggest offender among them is class 
> OpensslAesCtrCryptoCodec:
> {code}
>   401,189K (39.7%) (1 of sun.misc.Cleaner)
>      <-- Java Static: sun.misc.Cleaner.first
>   400,572K (39.6%) (14001 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager, java.util.jar.JarFile etc.)
>      <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- 
> sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next} <-- Java Static: 
> sun.misc.Cleaner.first
>   270,673K (26.8%) (2138 of org.apache.hadoop.mapred.JobConf)
>      <-- org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.conf <-- 
> j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next 
> <-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first
> ---------------------
>   102,232K (10.1%) (1 of j.l.r.Finalizer)
>      <-- Java Static: java.lang.ref.Finalizer.unfinalized
>   101,676K (10.1%) (8613 of 
> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, 
> java.util.zip.ZipFile$ZipFileInflaterInputStream, 
> org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager etc.)
>      <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- Java Static: 
> java.lang.ref.Finalizer.unfinalized
> {code}
> This heap dump was taken using 'jmap -dump:live', which forces the JVM to run 
> full GC before dumping the heap. So we are already looking at the heap right 
> after GC, and yet all these unfinalized objects are there. I think this 
> happens because the JVM always runs only one finalization thread, and thus 
> the queue of objects that need finalization may get processed too slowly. My 
> understanding is that finalization works as follows:
> 1. When GC runs, it discovers that object x that overrides finalize() is 
> unreachable.
> 2. x is added to the finalization queue. So technically x is still reachable, 
> it occupies memory, and _all the objects that it references stay in memory as 
> well_.
> 3. The finalization thread processes objects from the finalization queue 
> serially, thus x may stay in memory for long time.
> 4. x.finalize() is invoked, then x is made unreachable. If x stayed in memory 
> for long time, it's now in Old Gen of the heap, so only full GC can clean it 
> up.
> 5. When full GC finally occurs, x gets cleaned up.
> So finalization is formally reliable, but in practice it's quite possible 
> that a lot of unreachable, but unfinalized objects flood the memory. I guess 
> we are seeing all these OpensslAesCtrCryptoCodec objects when they are in 
> phase 3 above. And the really bad thing is that these objects in turn keep in 
> memory a whole lot of other stuff, in particular JobConf objects. Such a 
> JobConf has nothing to do with finalization, yet the GC cannot release it 
> until the corresponding OpensslAesCtrCryptoCodec's is gone.
> Here is OpensslAesCtrCryptoCodec.finalize() method with my comments:
> {code}
> protected void finalize() throws Throwable {
>   try {
>     Closeable r = (Closeable) this.random;
>     r.close();  // Relevant only when (random instanceof OsSecureRandom == 
> true)
>   } catch (ClassCastException e) {
>   }
>   super.finalize();  // Not needed, no finalize() in superclasses
> }
> {code}
> So, finalize() in this class, that may keep in memory a whole tree of 
> objects, is relevant only when this codec is configured to use OsSecureRandom 
> class. The latter reads random bytes from the configured file, and needs 
> finalization to close the input stream associated with that file.
> The suggested fix is to remove finalize() from OpensslAesCtrCryptoCodec and 
> add it to the only class from this "family" that really needs it, 
> OsSecureRandom. That will ensure that only OsSecureRandom objects (if/when 
> they are used) stay in memory awaiting finalization, and no other, irrelevant 
> objects.
> Note that this solution means that streams are still closed lazily. This, in 
> principle, may cause its own problems. So the most reliable fix would be to 
> call OsSecureRandom.close() explicitly when it's not needed anymore. But the 
> above fix is a necessary first step anyway, it will remove the most acute 
> problem with memory and will not make any other things worse than they 
> currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to