Hi Jaikiran Have a look at some blog posts by old friends :) These blog posts might be helpful (along with the other replies you received) to diagnose the root cause of the issue. In particular native memory tracking.
https://developers.redhat.com/blog/2017/03/14/java-inside-docker/ https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/ Regards, Jeremy On Fri, 2018-11-23 at 19:25 +0530, Jaikiran Pai wrote: > Hi, > > I'm looking for some inputs in debugging a high memory usage issue > (and > subsequently the process being killed) in one of the applications I > deal > with. Given that from what I have looked into this issue so far, this > appears to be something to do with the CMS collector, so I hope this > is > the right place to this question. > > A bit of a background - The application that I'm dealing with is > ElasticSearch server version 1.7.5. We use Java 8: > > java version "1.8.0_172" > Java(TM) SE Runtime Environment (build 1.8.0_172-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode) > > To add to the complexity in debugging this issue, this runs as a > docker > container on docker version 18.03.0-ce on a CentOS 7 host VM kernel > version 3.10.0-693.5.2.el7.x86_64. > > We have been noticing that this container/process keeps getting > killed > by the oom-killer every few days. The dmesg logs suggest that the > process has hit the "limits" set on the docker cgroups level. After > debugging this over past day or so, I've reached a point where I > can't > make much sense of the data I'm looking at. The JVM process is > started > using the following params (of relevance): > > java -Xms2G -Xmx6G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC .... > > As you can see it uses CMS collector with 75% of tenured/old gen for > initiating the GC. > > After a few hours/days of running I notice that even though the CMS > collector does run almost every hour or so, there are huge number of > objects _with no GC roots_ that never get collected. These objects > internally seem to hold on to ByteBuffer(s) which (from what I see) > as a > result never get released and the non-heap memory keeps building up, > till the process gets killed. To give an example, here's the jmap > -histo > output (only relevant parts): > > 1: 861642 196271400 [B > 2: 198776 28623744 > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame > 3: 676722 21655104 > org.apache.lucene.store.ByteArrayDataInput > 4: 202398 19430208 > org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter$IntBlockTerm > State > 5: 261819 18850968 > org.apache.lucene.util.fst.FST$Arc > 6: 178661 17018376 [C > 7: 31452 16856024 [I > 8: 203911 8049352 [J > 9: 85700 5484800 java.nio.DirectByteBufferR > 10: 168935 5405920 > java.util.concurrent.ConcurrentHashMap$Node > 11: 89948 5105328 [Ljava.lang.Object; > 12: 148514 4752448 > org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference > > .... > > Total 5061244 418712248 > > This above output is without the "live" option. Running jmap > -histo:live > returns something like (again only relevant parts): > > 13: 31753 1016096 > org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference > ... > 44: 887 127728 > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame > ... > 50: 3054 97728 > org.apache.lucene.store.ByteArrayDataInput > ... > 59: 888 85248 > org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter$IntBlockTerm > State > > Total 1177783 138938920 > > > Notice the vast difference between the live and non-live instances of > the same class. This isn't just in one "snapshot". I have been > monitoring this for more than a day and this pattern continues. Even > taking heap dumps and using tools like visualvm shows that these > instances have "no GC root" and I have even checked the gc log files > to > see that the CMS collector does occasionally run. However these > objects > never seem to get collected. > > I realize this data may not be enough to narrow down the issue, but > what > I am looking for is some kind of help/input/hints/suggestions on what > I > should be trying to figure out why these instances aren't GCed. Is > this > something that's expected in certain situations? > > -Jaikiran > > > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use@openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -- -- Jeremy Whiting Senior Software Engineer, Middleware Performance Team Red Hat ------------------------------------------------------------ Registered Address: Red Hat UK Ltd, Peninsular House, 30 Monument Street, London. United Kingdom. Registered in England and Wales under Company Registration No. 03798903. Directors: Directors:Michael Cunningham (US), Michael O'Neill(Ireland), Eric Shander (US) _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use