Dear JVM experts, Recently we found GC spike (long STW minor GC), and sys time is high when we GC time is high. Normally sys time is near 0 seconds and minor GC is less than 500ms.
From http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2017-October/020630.html and https://blog.gceasy.io/2016/12/11/sys-time-greater-than-user-time/, high sys time could be caused by operation system problem/VM related problem/memory constraint/disk IO pressure/Transparent Huge Pages. I checked them one by one, don't find any clue, could u please kindly provide suggestion? Thanks in advance! 1.operation system problem --We have enough CPU/memory/disk (48 cpu cores + 373 RAM with 160G heap, disk is enough), and there is no error in /var/log/dmesg 2. memory constraint -- We have enough available memory. available memory (free -m) is 263G 3. disk IO pressure -- Not find issue from disk info from prometheus node exporter. Granularity is 15s, and I can't find counterpart of avgqu-sz & util metrics (disk IO util and saturation metrics) which is part of iostat. It could be caused by big Granularity??? 4. VM related problem -- We are using physical machine 5. Transparent Huge Pages. It is madvise. It could be a problem, but we don't have this issue previously. It has been running for nearly 20 weeks. *cat /sys/kernel/mm/transparent_hugepage/enabledalways [madvise] never* *JDK version:* OpenJDK Runtime Environment, 1.8.0_222-b10 *Java Opts:* -javaagent:/server/jmx_prometheus_javaagent-0.12.0.jar=xxxx:/server/config.yaml -server -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=xxxx -Dcom.sun.management.jmxremote.rmi.port=xxxx -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Xloggc:/server/xxxx.log -XX:+PrintGCDateStamps -XX:AutoBoxCacheMax=1000000 -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=50 -XX:InitiatingHeapOccupancyPercent=70 -XX:+ParallelRefProcEnabled -XX:+ExplicitGCInvokesConcurrent -XX:+UseStringDeduplication -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xms160g -Xmx160g -XX:+HeapDumpOnOutOfMemoryError *Snippet of GC log:* 2020-01-20T07:27:03.166+0000: 2756.665: [GC pause (G1 Evacuation Pause) (young), *6.2899024 secs*] [Parallel Time: 6255.0 ms, GC Workers: 33] [GC Worker Start (ms): Min: 2756664.9, Avg: 2756665.5, Max: 2756666.1, Diff: 1.2] [Ext Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 5.3, Diff: 5.3, Sum: 16.8] [Update RS (ms): Min: 0.0, Avg: 0.8, Max: 1.1, Diff: 1.1, Sum: 25.6] [Processed Buffers: Min: 0, Avg: 1.6, Max: 4, Diff: 4, Sum: 53] [Scan RS (ms): Min: 142.0, Avg: 145.3, Max: 146.4, Diff: 4.4, Sum: 4794.1] [Code Root Scanning (ms): Min: 0.0, Avg: 0.3, Max: 3.5, Diff: 3.5, Sum: 8.8] * [Object Copy (ms): Min: 6100.1, Avg: 6101.8, Max: 6106.5, Diff: 6.4, Sum: 201358.4]* [Termination (ms): Min: 0.1, Avg: 5.2, Max: 6.7, Diff: 6.6, Sum: 172.9] [Termination Attempts: Min: 1, Avg: 1353.0, Max: 1476, Diff: 1475, Sum: 44650] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 7.0] [GC Worker Total (ms): Min: 6253.4, Avg: 6254.1, Max: 6254.7, Diff: 1.2, Sum: 206383.7] [GC Worker End (ms): Min: 2762919.4, Avg: 2762919.6, Max: 2762919.8, Diff: 0.4] [Code Root Fixup: 0.6 ms] [Code Root Purge: 0.0 ms] [String Dedup Fixup: 0.7 ms, GC Workers: 33] [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4] [Table Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.6, Diff: 0.6, Sum: 2.0] [Clear CT: 4.0 ms] [Other: 29.6 ms] [Choose CSet: 0.1 ms] [Ref Proc: 10.3 ms] [Ref Enq: 0.6 ms] [Redirty Cards: 11.3 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 6.5 ms] [Eden: 72576.0M(72576.0M)->0.0B(80896.0M) Survivors: 9344.0M->1024.0M Heap: 83520.0M(160.0G)->11046.9M(160.0G)] * [Times: user=27.19 sys=162.28, real=6.30 secs] * 2020-01-20T06:59:23.382+0000: 1096.881: [GC pause (G1 Evacuation Pause) (young) (initial-mark), *4.1248088 secs*] [Parallel Time: 4098.0 ms, GC Workers: 33] [GC Worker Start (ms): Min: 1096882.1, Avg: 1096882.8, Max: 1096883.2, Diff: 1.2] [Ext Root Scanning (ms): Min: 4.0, Avg: 4.8, Max: 6.1, Diff: 2.0, Sum: 159.7] [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.1, Diff: 1.1, Sum: 9.5] [Processed Buffers: Min: 0, Avg: 1.3, Max: 6, Diff: 6, Sum: 43] * [Scan RS (ms): Min: 2001.2, Avg: 2012.2, Max: 2013.4, Diff: 12.2, Sum: 66401.0]* [Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 10.7, Diff: 10.7, Sum: 18.5] * [Object Copy (ms): Min: 2039.3, Avg: 2049.2, Max: 2079.5, Diff: 40.2, Sum: 67623.1]* [Termination (ms): Min: 0.0, Avg: 29.6, Max: 39.7, Diff: 39.7, Sum: 978.0] [Termination Attempts: Min: 1, Avg: 6587.0, Max: 8068, Diff: 8067, Sum: 217372] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum: 7.9] [GC Worker Total (ms): Min: 4096.3, Avg: 4096.9, Max: 4097.7, Diff: 1.4, Sum: 135197.8] [GC Worker End (ms): Min: 1100979.5, Avg: 1100979.7, Max: 1100979.9, Diff: 0.4] [Code Root Fixup: 0.6 ms] [Code Root Purge: 0.2 ms] [String Dedup Fixup: 1.0 ms, GC Workers: 33] [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Table Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.7, Diff: 0.7, Sum: 1.4] [Clear CT: 3.4 ms] [Other: 21.7 ms] [Choose CSet: 0.0 ms] [Ref Proc: 9.1 ms] [Ref Enq: 0.9 ms] [Redirty Cards: 4.3 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 5.3 ms] [Eden: 81184.0M(81184.0M)->0.0B(72576.0M) Survivors: 736.0M->9344.0M Heap: 83508.0M(160.0G)->10944.0M(160.0G)] * [Times: user=68.40 sys=9.11, real=4.13 secs] * Thanks, Roy
_______________________________________________ hotspot-gc-use mailing list hotspot-gc-use@openjdk.java.net https://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use