Look at the promotion rates (sort of) in the "context around a single instance": subtract the 
"after" size of one line from the "after" size of the next line.  I see

        Size                                    Duration        PromotedK
        -------------------------------         ---------       ---------
        49545909K->47870050K(84724992K)       0.049020                          
                                                     
        49547874K->47872545K(84724992K)       0.047341            2495
        49550369K->47876404K(84724992K)       0.049631            3859
        49554228K->47892320K(84724992K)       0.048118           15916
        49570144K->48422333K(84724992K)      38.098495          530013
        50100157K->48528020K(84724992K)       0.067286          105687
        50205844K->48541030K(84724992K)       0.069611           13010
        50218854K->48542651K(84724992K)       0.051600            1621
        50220475K->48545932K(84724992K)       0.067547            3281
        50223756K->48540797K(84724992K)       0.063965           -5135
        50218621K->48545033K(84724992K)       0.051678            4236

(I hope that survives mail reformatting, but you get the idea.)

The interval around the long collection is not like the rest of the context.  
It promotes 100x the amount of memory, with some ramp up in the collection 
before and soma ramp down in the 2 collections after.  What's up with that?

Even 100x longer wouldn't be 38 seconds.  And the collections before and after 
copy more data but don't take longer.  So there's something that doesn't scale 
about the objects being copied, too.  Chasing a long list?  Copying a big array 
of references that then also have to be promoted?  Those would both happen all 
in one collection, not smeared out over 4 collections.  But it seems like 
application behavior is involved, not just a failure of the collector.  Do you 
know what the application is doing at that time?  Is is doing that at the other 
times with long pauses?  Do the contexts for the other long pauses look like 
that?

Or it could be something else.  It often is. :-)  I'm just looking under the 
streetlight.

                        ... peter

On 10/28/16 03:43 PM, David Ely wrote:
While typical ParNew GC times are 50ms, our application is occasionally hitting 
ParNew times that are over 15 seconds for one of our customers, and we have no 
idea why. Looking at the full GC log file:

382250 ParNew GCs are < 1 second
9303 are 100ms to 1 second
1267 are 1 second to 2 seconds
99 are 2 seconds to 10 seconds
24 are > 10 seconds, 48 seconds being the max

The long ones are somewhat bursty as you can see from looking at the line 
numbers in the GC log:

$ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0

12300:2016-10-21T01:03:20.380+0000: 20278.069: [GC2016-10-21T01:03:20.380+0000: 
20278.070: [ParNew: 1697741K->10024K(1887488K), 16.9913450 secs] 
33979542K->32817239K(84724992K), 16.9921050 secs] [Times: user=541.32 sys=14.37, 
real=16.99 secs]
43730:2016-10-21T14:12:25.050+0000: 67622.740: [GC2016-10-21T14:12:25.051+0000: 
67622.740: [ParNew: 1728194K->33817K(1887488K), 12.7508470 secs] 
49737924K->48320707K(84724992K), 12.7517840 secs] [Times: user=405.89 sys=11.05, 
real=12.75 secs]
44079:2016-10-21T14:18:55.172+0000: 68012.862: [GC2016-10-21T14:18:55.173+0000: 
68012.862: [ParNew: 1698371K->26958K(1887488K), 12.7384460 secs] 
50339815K->48930730K(84724992K), 12.7392360 secs] [Times: user=406.58 sys=11.29, 
real=12.73 secs]
50151:2016-10-21T17:10:14.471+0000: 78292.160: [GC2016-10-21T17:10:14.471+0000: 
78292.161: [ParNew: 1713813K->40968K(1887488K), 18.6593320 secs] 
49366906K->47959129K(84724992K), 18.6602550 secs] [Times: user=590.03 sys=17.45, 
real=18.66 secs]
56073:2016-10-21T19:59:36.847+0000: 88454.536: [GC2016-10-21T19:59:36.847+0000: 
88454.537: [ParNew: 1685720K->20763K(1887488K), 16.0840200 secs] 
50704025K->49302131K(84724992K), 16.0848810 secs] [Times: user=487.00 sys=16.84, 
real=16.09 secs]
78987:2016-10-22T05:49:25.623+0000: 123843.312: [GC2016-10-22T05:49:25.623+0000: 
123843.313: [ParNew: 1709771K->22678K(1887488K), 10.9933380 secs] 
43323834K->41914203K(84724992K), 10.9943060 secs] [Times: user=349.67 sys=9.84, 
real=10.99 secs]
79104:2016-10-22T05:59:26.382+0000: 124444.071: [GC2016-10-22T05:59:26.382+0000: 
124444.072: [ParNew: 1697024K->22260K(1887488K), 11.5490390 secs] 
44558499K->43145880K(84724992K), 11.5499650 secs] [Times: user=367.73 sys=10.01, 
real=11.55 secs]
79504:2016-10-22T06:09:36.983+0000: 125054.672: [GC2016-10-22T06:09:36.984+0000: 
125054.673: [ParNew: 1688112K->4769K(1887488K), 14.1528810 secs] 
46684947K->45263748K(84724992K), 14.1539860 secs] [Times: user=452.28 sys=12.71, 
real=14.15 secs]
79772:2016-10-22T06:30:36.130+0000: 126313.819: [GC2016-10-22T06:30:36.130+0000: 
126313.820: [ParNew: 1725520K->35893K(1887488K), 14.4479670 secs] 
48989739K->47563879K(84724992K), 14.4488810 secs] [Times: user=461.60 sys=13.04, 
real=14.45 secs]
80087:2016-10-22T06:37:07.202+0000: 126704.891: [GC2016-10-22T06:37:07.202+0000: 
126704.892: [ParNew: 1698021K->23440K(1887488K), 15.7039920 secs] 
50517163K->49105987K(84724992K), 15.7050040 secs] [Times: user=497.65 sys=14.75, 
real=15.70 secs]
89969:2016-10-22T13:54:27.978+0000: 152945.667: [GC2016-10-22T13:54:27.978+0000: 
152945.668: [ParNew: 1834914K->15978K(1887488K), 11.5637150 secs] 
48716340K->47307673K(84724992K), 11.5645440 secs] [Times: user=367.77 sys=10.01, 
real=11.57 secs]
90200:2016-10-22T14:05:02.717+0000: 153580.407: [GC2016-10-22T14:05:02.718+0000: 
153580.407: [ParNew: 1684626K->7078K(1887488K), 17.3424650 secs] 
50361539K->48947648K(84724992K), 17.3433490 secs] [Times: user=554.39 sys=15.81, 
real=17.34 secs]
90299:2016-10-22T14:14:30.521+0000: 154148.210: [GC2016-10-22T14:14:30.521+0000: 
154148.211: [ParNew: 1690850K->6078K(1887488K), 13.1699350 secs] 
51455784K->50033156K(84724992K), 13.1708900 secs] [Times: user=419.55 sys=11.54, 
real=13.17 secs]
261329:2016-10-26T00:06:44.499+0000: 448882.189: [GC2016-10-26T00:06:44.500+0000: 
448882.189: [ParNew: 1705614K->22224K(1887488K), 17.5831730 secs] 
40683698K->39525817K(84724992K), 17.5843270 secs] [Times: user=561.85 sys=14.79, 
real=17.58 secs]
261935:2016-10-26T00:13:34.277+0000: 449291.967: [GC2016-10-26T00:13:34.278+0000: 
449291.967: [ParNew: 1690085K->26707K(1887488K), 13.9331790 secs] 
43792178K->42655000K(84724992K), 13.9340780 secs] [Times: user=446.36 sys=11.45, 
real=13.93 secs]
262143:2016-10-26T00:20:09.397+0000: 449687.087: [GC2016-10-26T00:20:09.398+0000: 
449687.087: [ParNew: 1696593K->27078K(1887488K), 40.3344500 secs] 
45588644K->44444949K(84724992K), 40.3355430 secs] [Times: user=1248.15 sys=43.07, 
real=40.33 secs]
262275:2016-10-26T00:27:02.196+0000: 450099.886: [GC2016-10-26T00:27:02.197+0000: 
450099.886: [ParNew: 1683406K->17853K(1887488K), 17.7472360 secs] 
46908499K->45506131K(84724992K), 17.7482260 secs] [Times: user=567.03 sys=16.10, 
real=17.75 secs]
262282:2016-10-26T00:27:29.448+0000: 450127.138: [GC2016-10-26T00:27:29.449+0000: 
450127.138: [ParNew: 1687737K->10499K(1887488K), 35.4934000 secs] 
47195678K->46044477K(84724992K), 35.4943230 secs] [Times: user=1131.34 sys=31.87, 
real=35.49 secs]
262631:2016-10-26T00:34:17.632+0000: 450535.321: [GC2016-10-26T00:34:17.632+0000: 
450535.321: [ParNew: 1687590K->10226K(1887488K), 21.4043600 secs] 
49431427K->48018504K(84724992K), 21.4052230 secs] [Times: user=682.50 sys=19.46, 
real=21.41 secs]
262844:2016-10-26T00:41:08.118+0000: 450945.808: [GC2016-10-26T00:41:08.119+0000: 
450945.808: [ParNew: 1692928K->11302K(1887488K), 48.2899260 secs] 
51073216K->49915878K(84724992K), 48.2909550 secs] [Times: user=1493.17 sys=53.55, 
real=48.28 secs]
345421:2016-10-27T04:17:59.617+0000: 550357.306: [GC2016-10-27T04:17:59.618+0000: 
550357.307: [ParNew: 1695052K->22991K(1887488K), 33.8707510 secs] 
46334738K->45187822K(84724992K), 33.8718980 secs] [Times: user=1081.31 sys=30.59, 
real=33.86 secs]
345510:2016-10-27T04:24:11.721+0000: 550729.411: [GC2016-10-27T04:24:11.722+0000: 
550729.411: [ParNew: 1705080K->20401K(1887488K), 18.9795540 secs] 
47388073K->45965537K(84724992K), 18.9805410 secs] [Times: user=606.94 sys=17.25, 
real=18.98 secs]
345514:2016-10-27T04:24:36.695+0000: 550754.385: [GC2016-10-27T04:24:36.696+0000: 
550754.385: [ParNew: 1707810K->32640K(1887488K), 30.9728200 secs] 
47656489K->46506725K(84724992K), 30.9737300 secs] [Times: user=917.67 sys=33.07, 
real=30.97 secs]
345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 
551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 
49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, 
real=38.09 secs]

Context around a single instance is fairly normal:

345773-2016-10-27T04:31:28.032+0000: 551165.721: [GC2016-10-27T04:31:28.033+0000: 
551165.722: [ParNew: 1685858K->8851K(1887488K), 0.0480250 secs] 
49545909K->47870050K(84724992K), 0.0490200 secs] [Times: user=1.47 sys=0.02, 
real=0.05 secs]
345774-2016-10-27T04:31:28.635+0000: 551166.324: [GC2016-10-27T04:31:28.636+0000: 
551166.325: [ParNew: 1686675K->10456K(1887488K), 0.0463570 secs] 
49547874K->47872545K(84724992K), 0.0473410 secs] [Times: user=1.41 sys=0.04, 
real=0.05 secs]
345775-2016-10-27T04:31:29.205+0000: 551166.894: [GC2016-10-27T04:31:29.205+0000: 
551166.894: [ParNew: 1688280K->12733K(1887488K), 0.0487100 secs] 
49550369K->47876404K(84724992K), 0.0496310 secs] [Times: user=1.47 sys=0.04, 
real=0.05 secs]
345776-2016-10-27T04:31:29.798+0000: 551167.487: [GC2016-10-27T04:31:29.798+0000: 
551167.488: [ParNew: 1690557K->26694K(1887488K), 0.0471170 secs] 
49554228K->47892320K(84724992K), 0.0481180 secs] [Times: user=1.40 sys=0.02, 
real=0.05 secs]
345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 
551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 
49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, 
real=38.09 secs]
345778-2016-10-27T04:32:08.449+0000: 551206.139: [GC2016-10-27T04:32:08.450+0000: 
551206.139: [ParNew: 1708684K->122033K(1887488K), 0.0664280 secs] 
50100157K->48528020K(84724992K), 0.0672860 secs] [Times: user=1.60 sys=0.05, 
real=0.07 secs]
345779-2016-10-27T04:32:09.090+0000: 551206.779: [GC2016-10-27T04:32:09.091+0000: 
551206.780: [ParNew: 1799857K->42169K(1887488K), 0.0688910 secs] 
50205844K->48541030K(84724992K), 0.0696110 secs] [Times: user=1.70 sys=0.03, 
real=0.07 secs]
345780-2016-10-27T04:32:09.802+0000: 551207.491: [GC2016-10-27T04:32:09.802+0000: 
551207.491: [ParNew: 1719993K->43790K(1887488K), 0.0508540 secs] 
50218854K->48542651K(84724992K), 0.0516000 secs] [Times: user=1.54 sys=0.03, 
real=0.05 secs]
345781-2016-10-27T04:32:10.536+0000: 551208.226: [GC2016-10-27T04:32:10.537+0000: 
551208.226: [ParNew: 1721614K->30389K(1887488K), 0.0668100 secs] 
50220475K->48545932K(84724992K), 0.0675470 secs] [Times: user=1.81 sys=0.03, 
real=0.06 secs]
345782-2016-10-27T04:32:11.137+0000: 551208.826: [GC2016-10-27T04:32:11.137+0000: 
551208.826: [ParNew: 1708213K->18631K(1887488K), 0.0632570 secs] 
50223756K->48540797K(84724992K), 0.0639650 secs] [Times: user=1.95 sys=0.01, 
real=0.06 secs]
345783-2016-10-27T04:32:11.642+0000: 551209.332: [GC2016-10-27T04:32:11.643+0000: 
551209.332: [ParNew: 1696455K->19415K(1887488K), 0.0509260 secs] 
50218621K->48545033K(84724992K), 0.0516780 secs] [Times: user=1.55 sys=0.03, 
real=0.05 secs]

Since the user times are high as well, I don't think this could be swapping.

Here are the hard-earned set of JVM arguments that we're using:

-d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
  -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
  -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \
  -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
  -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
  -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC -XX:+UseBiasedLocking \
  -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \
  -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
  -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \
  -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \
  -XX:+UseGCLogFileRotation \
  -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
  -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log

This is on Linux with Java 1.7.0_72.

Does this look familiar to anyone? Alternatively, are there some more JVM 
options that we could include to get more information?

One of the first things that we'll try is to move to a later JVM, but it will 
be easier to get the customer to do that if we can point to a specific issue 
that has been addressed.

Thanks for your help.

David


_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

Reply via email to