+1 on disabling THP

Charlie 

> On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich <vita...@gmail.com> wrote:
> 
> David,
> 
> Ask them to turn off THP - it's a known source of large latency due to the 
> kernel doing page defragmentation; your app takes a page fault, and boom - 
> the kernel may start doing defragmentation to make a huge page available.  
> You can search online for THP issues.  The symptoms are similar to yours - 
> very high sys time.
> 
> If they turn it off and still get same lengthy parnew pauses, then it's 
> clearly something else but at least we'll eliminate THP as the culprit.
> 
>> On Saturday, October 29, 2016, David Ely <david....@unboundid.com> wrote:
>> Thank you for the response. Yes. meminfo (see full output below) shows ~80GB 
>> of AnonHugePages, which is pretty close to the size of the JVM (full output 
>> below). Looking back through previous information that we have from this 
>> customer, transparent huge pages have been turned on for years. We've asked 
>> them for anything else that might have changed in this environment.
>> 
>> Are there any other JVM options that we could enable that would shed light 
>> on what's going on within the ParNew? Would -XX:+PrintTLAB -XX:+PrintPLAB 
>> -XX:PrintFLSStatistics=1 show anything useful?
>> 
>> David
>> 
>> 
>> MemTotal:       264396572 kB
>> MemFree:         2401576 kB
>> Buffers:          381564 kB
>> Cached:         172673120 kB
>> SwapCached:            0 kB
>> Active:         163439836 kB
>> Inactive:       90737452 kB
>> Active(anon):   76910848 kB
>> Inactive(anon):  4212580 kB
>> Active(file):   86528988 kB
>> Inactive(file): 86524872 kB
>> Unevictable:           0 kB
>> Mlocked:               0 kB
>> SwapTotal:      16236540 kB
>> SwapFree:       16236540 kB
>> Dirty:             14552 kB
>> Writeback:             0 kB
>> AnonPages:      81111768 kB
>> Mapped:            31312 kB
>> Shmem:               212 kB
>> Slab:            6078732 kB
>> SReclaimable:    5956052 kB
>> SUnreclaim:       122680 kB
>> KernelStack:       41296 kB
>> PageTables:       171324 kB
>> NFS_Unstable:          0 kB
>> Bounce:                0 kB
>> WritebackTmp:          0 kB
>> CommitLimit:    148434824 kB
>> Committed_AS:   93124984 kB
>> VmallocTotal:   34359738367 kB
>> VmallocUsed:      686780 kB
>> VmallocChunk:   34225639420 kB
>> HardwareCorrupted:     0 kB
>> AnonHugePages:  80519168 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>> DirectMap4k:        5132 kB
>> DirectMap2M:     1957888 kB
>> DirectMap1G:    266338304 kB
>> 
>> 
>>> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich <vita...@gmail.com> 
>>> wrote:
>>> 
>>> 
>>>> On Friday, October 28, 2016, David Ely <david....@unboundid.com> wrote:
>>>> While typical ParNew GC times are 50ms, our application is occasionally 
>>>> hitting ParNew times that are over 15 seconds for one of our customers, 
>>>> and we have no idea why. Looking at the full GC log file:
>>>> 
>>>> 382250 ParNew GCs are < 1 second
>>>> 9303 are 100ms to 1 second
>>>> 1267 are 1 second to 2 seconds
>>>> 99 are 2 seconds to 10 seconds
>>>> 24 are > 10 seconds, 48 seconds being the max
>>>> 
>>>> The long ones are somewhat bursty as you can see from looking at the line 
>>>> numbers in the GC log:
>>>> 
>>>> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0 
>>>> 
>>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069: 
>>>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: 
>>>> 1697741K->10024K(1887488K), 16.9913450 secs] 
>>>> 33979542K->32817239K(84724992K), 16.9921050 secs] [Times: user=541.32 
>>>> sys=14.37, real=16.99 secs] 
>>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740: 
>>>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: 
>>>> 1728194K->33817K(1887488K), 12.7508470 secs] 
>>>> 49737924K->48320707K(84724992K), 12.7517840 secs] [Times: user=405.89 
>>>> sys=11.05, real=12.75 secs] 
>>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862: 
>>>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: 
>>>> 1698371K->26958K(1887488K), 12.7384460 secs] 
>>>> 50339815K->48930730K(84724992K), 12.7392360 secs] [Times: user=406.58 
>>>> sys=11.29, real=12.73 secs] 
>>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160: 
>>>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: 
>>>> 1713813K->40968K(1887488K), 18.6593320 secs] 
>>>> 49366906K->47959129K(84724992K), 18.6602550 secs] [Times: user=590.03 
>>>> sys=17.45, real=18.66 secs] 
>>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536: 
>>>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: 
>>>> 1685720K->20763K(1887488K), 16.0840200 secs] 
>>>> 50704025K->49302131K(84724992K), 16.0848810 secs] [Times: user=487.00 
>>>> sys=16.84, real=16.09 secs] 
>>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312: 
>>>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: 
>>>> 1709771K->22678K(1887488K), 10.9933380 secs] 
>>>> 43323834K->41914203K(84724992K), 10.9943060 secs] [Times: user=349.67 
>>>> sys=9.84, real=10.99 secs] 
>>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071: 
>>>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: 
>>>> 1697024K->22260K(1887488K), 11.5490390 secs] 
>>>> 44558499K->43145880K(84724992K), 11.5499650 secs] [Times: user=367.73 
>>>> sys=10.01, real=11.55 secs] 
>>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672: 
>>>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: 
>>>> 1688112K->4769K(1887488K), 14.1528810 secs] 
>>>> 46684947K->45263748K(84724992K), 14.1539860 secs] [Times: user=452.28 
>>>> sys=12.71, real=14.15 secs] 
>>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819: 
>>>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: 
>>>> 1725520K->35893K(1887488K), 14.4479670 secs] 
>>>> 48989739K->47563879K(84724992K), 14.4488810 secs] [Times: user=461.60 
>>>> sys=13.04, real=14.45 secs] 
>>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891: 
>>>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: 
>>>> 1698021K->23440K(1887488K), 15.7039920 secs] 
>>>> 50517163K->49105987K(84724992K), 15.7050040 secs] [Times: user=497.65 
>>>> sys=14.75, real=15.70 secs] 
>>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667: 
>>>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: 
>>>> 1834914K->15978K(1887488K), 11.5637150 secs] 
>>>> 48716340K->47307673K(84724992K), 11.5645440 secs] [Times: user=367.77 
>>>> sys=10.01, real=11.57 secs] 
>>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407: 
>>>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: 
>>>> 1684626K->7078K(1887488K), 17.3424650 secs] 
>>>> 50361539K->48947648K(84724992K), 17.3433490 secs] [Times: user=554.39 
>>>> sys=15.81, real=17.34 secs] 
>>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210: 
>>>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: 
>>>> 1690850K->6078K(1887488K), 13.1699350 secs] 
>>>> 51455784K->50033156K(84724992K), 13.1708900 secs] [Times: user=419.55 
>>>> sys=11.54, real=13.17 secs] 
>>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189: 
>>>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: 
>>>> 1705614K->22224K(1887488K), 17.5831730 secs] 
>>>> 40683698K->39525817K(84724992K), 17.5843270 secs] [Times: user=561.85 
>>>> sys=14.79, real=17.58 secs] 
>>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967: 
>>>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: 
>>>> 1690085K->26707K(1887488K), 13.9331790 secs] 
>>>> 43792178K->42655000K(84724992K), 13.9340780 secs] [Times: user=446.36 
>>>> sys=11.45, real=13.93 secs] 
>>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087: 
>>>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: 
>>>> 1696593K->27078K(1887488K), 40.3344500 secs] 
>>>> 45588644K->44444949K(84724992K), 40.3355430 secs] [Times: user=1248.15 
>>>> sys=43.07, real=40.33 secs] 
>>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886: 
>>>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: 
>>>> 1683406K->17853K(1887488K), 17.7472360 secs] 
>>>> 46908499K->45506131K(84724992K), 17.7482260 secs] [Times: user=567.03 
>>>> sys=16.10, real=17.75 secs] 
>>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138: 
>>>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: 
>>>> 1687737K->10499K(1887488K), 35.4934000 secs] 
>>>> 47195678K->46044477K(84724992K), 35.4943230 secs] [Times: user=1131.34 
>>>> sys=31.87, real=35.49 secs] 
>>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321: 
>>>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: 
>>>> 1687590K->10226K(1887488K), 21.4043600 secs] 
>>>> 49431427K->48018504K(84724992K), 21.4052230 secs] [Times: user=682.50 
>>>> sys=19.46, real=21.41 secs] 
>>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808: 
>>>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: 
>>>> 1692928K->11302K(1887488K), 48.2899260 secs] 
>>>> 51073216K->49915878K(84724992K), 48.2909550 secs] [Times: user=1493.17 
>>>> sys=53.55, real=48.28 secs] 
>>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306: 
>>>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: 
>>>> 1695052K->22991K(1887488K), 33.8707510 secs] 
>>>> 46334738K->45187822K(84724992K), 33.8718980 secs] [Times: user=1081.31 
>>>> sys=30.59, real=33.86 secs] 
>>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411: 
>>>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: 
>>>> 1705080K->20401K(1887488K), 18.9795540 secs] 
>>>> 47388073K->45965537K(84724992K), 18.9805410 secs] [Times: user=606.94 
>>>> sys=17.25, real=18.98 secs] 
>>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385: 
>>>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: 
>>>> 1707810K->32640K(1887488K), 30.9728200 secs] 
>>>> 47656489K->46506725K(84724992K), 30.9737300 secs] [Times: user=917.67 
>>>> sys=33.07, real=30.97 secs] 
>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: 
>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 
>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 
>>>> 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 
>>>> sys=34.79, real=38.09 secs] 
>>>> 
>>>> Context around a single instance is fairly normal:
>>>> 
>>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721: 
>>>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: 
>>>> 1685858K->8851K(1887488K), 0.0480250 secs] 
>>>> 49545909K->47870050K(84724992K), 0.0490200 secs] [Times: user=1.47 
>>>> sys=0.02, real=0.05 secs] 
>>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324: 
>>>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: 
>>>> 1686675K->10456K(1887488K), 0.0463570 secs] 
>>>> 49547874K->47872545K(84724992K), 0.0473410 secs] [Times: user=1.41 
>>>> sys=0.04, real=0.05 secs] 
>>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894: 
>>>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: 
>>>> 1688280K->12733K(1887488K), 0.0487100 secs] 
>>>> 49550369K->47876404K(84724992K), 0.0496310 secs] [Times: user=1.47 
>>>> sys=0.04, real=0.05 secs] 
>>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487: 
>>>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: 
>>>> 1690557K->26694K(1887488K), 0.0471170 secs] 
>>>> 49554228K->47892320K(84724992K), 0.0481180 secs] [Times: user=1.40 
>>>> sys=0.02, real=0.05 secs] 
>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: 
>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 
>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 
>>>> 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 
>>>> sys=34.79, real=38.09 secs] 
>>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139: 
>>>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: 
>>>> 1708684K->122033K(1887488K), 0.0664280 secs] 
>>>> 50100157K->48528020K(84724992K), 0.0672860 secs] [Times: user=1.60 
>>>> sys=0.05, real=0.07 secs] 
>>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779: 
>>>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: 
>>>> 1799857K->42169K(1887488K), 0.0688910 secs] 
>>>> 50205844K->48541030K(84724992K), 0.0696110 secs] [Times: user=1.70 
>>>> sys=0.03, real=0.07 secs] 
>>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491: 
>>>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: 
>>>> 1719993K->43790K(1887488K), 0.0508540 secs] 
>>>> 50218854K->48542651K(84724992K), 0.0516000 secs] [Times: user=1.54 
>>>> sys=0.03, real=0.05 secs] 
>>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226: 
>>>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: 
>>>> 1721614K->30389K(1887488K), 0.0668100 secs] 
>>>> 50220475K->48545932K(84724992K), 0.0675470 secs] [Times: user=1.81 
>>>> sys=0.03, real=0.06 secs] 
>>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826: 
>>>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: 
>>>> 1708213K->18631K(1887488K), 0.0632570 secs] 
>>>> 50223756K->48540797K(84724992K), 0.0639650 secs] [Times: user=1.95 
>>>> sys=0.01, real=0.06 secs] 
>>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332: 
>>>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: 
>>>> 1696455K->19415K(1887488K), 0.0509260 secs] 
>>>> 50218621K->48545033K(84724992K), 0.0516780 secs] [Times: user=1.55 
>>>> sys=0.03, real=0.05 secs] 
>>>> 
>>>> Since the user times are high as well, I don't think this could be 
>>>> swapping.
>>> 
>>> Can you ask the customer if they're using transparent hugepages (THP)? 
>>>> 
>>>> Here are the hard-earned set of JVM arguments that we're using:
>>>> 
>>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>>>>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
>>>>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \
>>>>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>>>>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
>>>>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC 
>>>> -XX:+UseBiasedLocking \
>>>>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \
>>>>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>>>>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \
>>>>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \
>>>>   -XX:+UseGCLogFileRotation \
>>>>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>>>>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>>>> 
>>>> This is on Linux with Java 1.7.0_72.
>>>> 
>>>> Does this look familiar to anyone? Alternatively, are there some more JVM 
>>>> options that we could include to get more information? 
>>>> 
>>>> One of the first things that we'll try is to move to a later JVM, but it 
>>>> will be easier to get the customer to do that if we can point to a 
>>>> specific issue that has been addressed.
>>>> 
>>>> Thanks for your help.
>>>> 
>>>> David
>>> 
>>> 
>>> -- 
>>> Sent from my phone
>> 
> 
> 
> -- 
> Sent from my phone
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

Reply via email to