Thanks for the quick response Bernd! The app is Hadoop HDFS Namenode (not sure if you know something about it)
For your points: - try to check heapdump or object histogram to get a feeling for the memory usage We are doing it over the weekend and hopefully we can get some idea about untraceable objects. Any other things we need to pay attention to? - if you have some active caching which reacts on settings, memory usage or available ram check if those settings are ok - check the finalizer queue Not sure about the above too, I don’t think we have active caching based on our JVM options. I need to check the finalizer queues if possible. - there might be not enough FullGCs to free finalizer, soft references, native buffers or class(loaders) There is no Full GC in the GC log and TBH we don’t want Full GC since this will cause long delay which is not acceptable. I evaluated G1GC but it doesn’t work well especially it needs another 50% off heap memory size for RS and CS. Thanks, Fengnan > On Jun 14, 2019, at 2:25 PM, Bernd Eckenfels <[email protected]> wrote: > > Hello, > > Hard to say without knowing the app, but I can give you a few hints: > > - try to check heapdump or object histogram to get a feeling for the memory > usage > - if you have some active caching which reacts on settings, memory usage or > available ram check if those settings are ok > - check the finalizer queue > - there might be not enough FullGCs to free finalizer, soft references, > native buffers or class(loaders) > > There might be some pathological cases where GC falls flat, but it is rather > unlikely compared to application errors. > > Did you tried G1 yet? > > Gruss > Bernd > > > -- > http://bernd.eckenfels.net > > Von: hotspot-gc-use <[email protected]> im Auftrag von > Fengnan Li <[email protected]> > Gesendet: Freitag, Juni 14, 2019 11:12 PM > An: [email protected] > Betreff: CMS major GC doesn't clean up heap enough > > Hi, > > I am using CMS in in the app with very large heap size (200G), and I found > that when the major GC was triggered, not much objects were cleaned up, > ending with almost the same heap usage. Can someone help? > > Our JVM option: > > -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=40 > -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC > -XX:GCLogFileSize=104857600 -XX:InitialHeapSize=214748364800 > -XX:+ManagementServer -XX:MaxHeapSize=214748364800 -XX:MaxNewSize=17179869184 > -XX:MaxTenuringThreshold=6 -XX:NewSize=17179869184 -XX:OldPLABSize=16 > -XX:ParGCCardsPerStrideChunk=32768 -XX:+PrintGC -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UnlockDiagnosticVMOptions > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC > -XX:+UseCondCardMark -XX:+UseGCLogFileRotation -XX:+UseParNewGC > > Some part of the Major GC log: > > 2019-05-06T17:32:46.822+0000: 485.965: [GC (CMS Initial Mark) [1 > CMS-initial-mark: 77254857K(192937984K)] 79190292K(208037504K), 0.1421191 > secs] [Times: user=1.96 sys=0.04, real=0.14 secs] > 2019-05-06T17:32:46.964+0000: 486.108: [CMS-concurrent-mark-start] > 2019-05-06T17:32:52.757+0000: 491.900: [GC (Allocation Failure) > 2019-05-06T17:32:52.757+0000: 491.900: [ParNew: > 15099520K->1677696K(15099520K), 2.5151990 secs] > 92354377K->80544859K(208037504K), 2.5153318 secs] [Times: user=50.76 > sys=1.14, real=2.51 secs] > 2019-05-06T17:33:01.126+0000: 500.269: [GC (Allocation Failure) > 2019-05-06T17:33:01.126+0000: 500.270: [ParNew: > 15099520K->1677696K(15099520K), 2.2934176 secs] > 93966683K->81938923K(208037504K), 2.2935316 secs] [Times: user=46.49 > sys=1.09, real=2.29 secs] > 2019-05-06T17:33:09.430+0000: 508.573: [GC (Allocation Failure) > 2019-05-06T17:33:09.430+0000: 508.574: [ParNew: > 15099520K->1677696K(15099520K), 1.8824589 secs] > 95360747K->83297959K(208037504K), 1.8825875 secs] [Times: user=33.19 > sys=1.00, real=1.89 secs] > 2019-05-06T17:33:13.116+0000: 512.259: [CMS-concurrent-mark: 19.456/26.152 > secs] [Times: user=265.04 sys=5.29, real=26.15 secs] > 2019-05-06T17:33:13.116+0000: 512.260: [CMS-concurrent-preclean-start] > 2019-05-06T17:33:14.382+0000: 513.526: [GC (Allocation Failure) > 2019-05-06T17:33:14.382+0000: 513.526: [ParNew: > 15099520K->1345724K(15099520K), 2.2104188 secs] > 96719783K->84244360K(208037504K), 2.2105118 secs] [Times: user=42.36 > sys=0.88, real=2.21 secs] > 2019-05-06T17:33:18.620+0000: 517.764: [CMS-concurrent-preclean: 3.284/5.504 > secs] [Times: user=48.81 sys=1.03, real=5.51 secs] > 2019-05-06T17:33:18.620+0000: 517.764: > [CMS-concurrent-abortable-preclean-start] > 2019-05-06T17:33:21.094+0000: 520.238: [GC (Allocation Failure) > 2019-05-06T17:33:21.094+0000: 520.238: [ParNew: > 14767548K->1677696K(15099520K), 1.7077210 secs] > 97666184K->84979595K(208037504K), 1.7078275 secs] [Times: user=36.05 > sys=0.39, real=1.71 secs] > 2019-05-06T17:33:23.280+0000: 522.424: [CMS-concurrent-abortable-preclean: > 2.951/4.660 secs] [Times: user=41.72 sys=0.63, real=4.65 secs] > 2019-05-06T17:33:23.281+0000: 522.424: [GC (CMS Final Remark) [YG occupancy: > 3134013 K (15099520 K)]2019-05-06T17:33:23.281+0000: 522.424: [Rescan > (parallel) , 0.1699867 secs]2019-05-06T17:33:23.451+0000: 522.594: [weak refs > processing, 0.0001797 secs]2019-05-06T17:33:23.451+0000: 522.594: [class > unloading, 0.0121369 secs]2019-05-06T17:33:23.463+0000: 522.607: [scrub > symbol table, 0.0050611 secs]2019-05-06T17:33:23.468+0000: 522.612: [scrub > string table, 0.0007207 secs][1 CMS-remark: 83301899K(192937984K)] > 86435913K(208037504K), 0.1900326 secs] [Times: user=3.61 sys=0.03, real=0.18 > secs] > 2019-05-06T17:33:23.471+0000: 522.614: [CMS-concurrent-sweep-start] > 2019-05-06T17:33:27.826+0000: 526.969: [GC (Allocation Failure) > 2019-05-06T17:33:27.826+0000: 526.969: [ParNew: > 15099520K->1677696K(15099520K), 2.5180474 secs] > 98390054K->86741207K(208037504K), 2.5181662 secs] [Times: user=45.81 > sys=1.33, real=2.52 secs] > 2019-05-06T17:33:34.762+0000: 533.906: [GC (Allocation Failure) > 2019-05-06T17:33:34.762+0000: 533.906: [ParNew: > 15099520K->1641794K(15099520K), 1.9212927 secs] > 100160159K->87911207K(208037504K), 1.9214017 secs] [Times: user=36.08 > sys=0.88, real=1.92 secs] > 2019-05-06T17:33:41.271+0000: 540.414: [GC (Allocation Failure) > 2019-05-06T17:33:41.271+0000: 540.414: [ParNew: > 15063618K->1677696K(15099520K), 2.3496494 secs] > 101330218K->89115463K(208037504K), 2.3497471 secs] [Times: user=46.58 > sys=0.87, real=2.36 secs] > 2019-05-06T17:33:47.963+0000: 547.107: [GC (Allocation Failure) > 2019-05-06T17:33:47.963+0000: 547.107: [ParNew: > 15099520K->1677696K(15099520K), 2.0292300 secs] > 102536721K->90319393K(208037504K), 2.0293108 secs] [Times: user=38.49 > sys=0.84, real=2.03 secs] > 2019-05-06T17:33:50.641+0000: 549.784: [CMS-concurrent-sweep: 18.349/27.170 > secs] [Times: user=202.69 sys=4.94, real=27.18 secs] > 2019-05-06T17:33:50.641+0000: 549.784: [CMS-concurrent-reset-start] > 2019-05-06T17:33:51.305+0000: 550.448: [CMS-concurrent-reset: 0.663/0.664 > secs] [Times: user=1.28 sys=0.04, real=0.66 secs] > 2019-05-06T17:33:53.304+0000: 552.448: [GC (CMS Initial Mark) [1 > CMS-initial-mark: 88641697K(192937984K)] 100346893K(208037504K), 0.2599963 > secs] [Times: user=4.83 sys=0.03, real=0.26 secs] > > I am not sure if this is indicating some memory leak or GC is just not > working properly. I have the same app in another machine running with the > same JVM option works fine under 1/10th of the traffic as this one. > > Thanks, > Fengnan
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ hotspot-gc-use mailing list [email protected] https://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
