On 12/6/18 2:28 AM, Ross Black wrote:
Hi,

When running Kafka servers on Linux, the zookeeper sessions have been expiring frequently. After lots of investigation, I discovered some threads (not the zookeeper threads)
taking a very long time to get to the safepoint  (frequently > 2 seconds)


A quick google shows that Kafka is using writeable mmaps. By itself this is already terrible if you have a large amount of data to write, but in combination with the JVM it is horrendous. An innocent memory access can cause the thread to be blocked on I/O, and since the dirtying rate can be much higher that cleaning rate, you can end up with the thread blocked for large amounts of time. The JVM doesn't know anything about it, so as far as it knows the thread is at a non-safe point.


If Kafka has a setting that disables mmap, I recommend trying it. The thread will still block, but at least it won't prevent Java from reaching a safepoint.


The output from a pause looks like:


|
[2018-12-05T22:54:58.034+0000][safepoint]Thread:0x00007f23840d1800[0x17e4]State:_running _has_called_back 0_at_poll_safepoint 0 [2018-12-05T22:54:58.034+0000][safepoint]Thread:0x00007f23840d1800[0x17e4]State:_running _has_called_back 0_at_poll_safepoint 0 [2018-12-05T22:54:58.034+0000][safepoint]Thread:0x00007f23840d1800[0x17e4]State:_running _has_called_back 0_at_poll_safepoint 0
...(repeated continually forapprox 2seconds)

[2018-12-05T22:55:00.548+0000][safepoint]Thread:0x00007f23840d1800[0x17e4]State:_running _has_called_back 0_at_poll_safepoint 0 [2018-12-05T22:55:00.549+0000][safepoint]Thread:0x00007f23840d1800[0x17e4]State:_running _has_called_back 0_at_poll_safepoint 0 [2018-12-05T22:55:00.550+0000][safepoint]Thread:0x00007f23840d1800[0x17e4]State:_running _has_called_back 0_at_poll_safepoint 0 [2018-12-05T22:55:00.551+0000][safepoint]...found polling page returnexception at pc =0x00007f3397c7deff,stub =0x00007f338fb4c900 [2018-12-05T22:55:00.551+0000][safepoint]Thread:0x00007f23840d1800[0x17e4]State:_call_back _has_called_back 0_at_poll_safepoint 0
[2018-12-05T22:55:00.551+0000][safepoint]Waitingfor3thread(s)to block
[2018-12-05T22:55:00.551+0000][safepoint]Enteringsafepoint region:G1IncCollectionPause
[2018-12-05T22:55:00.566+0000][safepoint]Leavingsafepoint region
[2018-12-05T22:55:00.566+0000][safepoint]Totaltime forwhich application threads were stopped:2.5331635seconds,Stoppingthreads took:2.5180973seconds
|
=> BOOT @ 2018-12-05T03:12:26+00:00, epoch 1543979546
=> 2018-12-05T22:55:00+00:00: epoch 1544050500, boot offset 70954

thread 0x17e4 = 6116

Running 'perf' to sample threads:
  sudo perf record -F 99 --proc-map-timeout 5000 --sample-cpu --switch-events

  sudo perf script --tid 6116 > thread-6116.txt

|
ReplicaFetcherT6116[007]70951.936265:47425865cycles:ppp:7f338fc9ad8aregister_finalizer Runtime1stub (/tmp/perf-3239.map) ReplicaFetcherT6116[007]70951.987993:61736469cycles:ppp:7f33977d3165Ljava/util/ArrayList;::add (/tmp/perf-3239.map)
/**big time gap here/
ReplicaFetcherT6116[007]70954.862422:70423359cycles:ppp:7f3397b26fe0Ljava/util/LinkedHashMap;::afterNodeInsertion (/tmp/perf-3239.map) ReplicaFetcherT6116[007]70954.955571:58601064cycles:ppp:7f3397782118Ljava/lang/String;::charAt (/tmp/perf-3239.map) ReplicaFetcherT6116[007]70955.010857:56837522cycles:ppp:7f33977347b6Ljava/util/concurrent/ConcurrentHashMap;::tabAt (/tmp/perf-3239.map)

|


  sudo perf script --cpu 7 > cpu-7.txt

|
ReplicaFetcherT6116[007]70951.864747:52073629cycles:ppp:7f3397f784f4Lorg/apache/kafka/common/TopicPartition;::equals (/tmp/perf-3239.map)          swapper 0[007]70951.889555:52939697cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])          swapper 0[007]70951.914775:45114901cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms]) ReplicaFetcherT6116[007]70951.936265:47425865cycles:ppp:7f338fc9ad8aregister_finalizer Runtime1stub (/tmp/perf-3239.map)          swapper 0[007]70951.958858:61162448cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms]) ReplicaFetcherT6116[007]70951.987993:61736469cycles:ppp:7f33977d3165Ljava/util/ArrayList;::add (/tmp/perf-3239.map)          swapper 0[007]70952.017399:66816847cycles:ppp: ffffffff9f39ccb4 cpu_idle_poll ([kernel.kallsyms])          swapper 0[007]70952.049231:63438867cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])
/...(repeated continually forapprox 2seconds)/

         swapper 0[007]70954.228216:201536401cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])          swapper 0[007]70954.324191:216277840cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms])          swapper 0[007]70954.427185:227564188cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])          swapper 0[007]70954.535554:236205951cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])          swapper 0[007]70954.648051:137191740cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms])          swapper 0[007]70954.713384:159810740cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])          swapper 0[007]70954.789490:153089075cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms]) ReplicaFetcherT6116[007]70954.862422:70423359cycles:ppp:7f3397b26fe0Ljava/util/LinkedHashMap;::afterNodeInsertion (/tmp/perf-3239.map)          swapper 0[007]70954.895969:65525177cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])          swapper 0[007]70954.927183:59590550cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms]) ReplicaFetcherT6116[007]70954.955571:58601064cycles:ppp:7f3397782118Ljava/lang/String;::charAt (/tmp/perf-3239.map)          swapper 0[007]70954.983485:57455478cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms]) ReplicaFetcherT6116[007]70955.010857:56837522cycles:ppp:7f33977347b6Ljava/util/concurrent/ConcurrentHashMap;::tabAt (/tmp/perf-3239.map)


|


It looks like the thread (and CPU?) has been descheduled for approx 2 seconds!

I am struggling to find what is causing this behaviour.
It is occurring on 3 Kafka servers running a moderate load (all the same configuration). No other applications are running on the machine (except for fail2ban-server, dnsmasq, node_exporter)

I have made numerous changes to the JDK, OS, and BIOS settings, but none of them have solved the problem. The problem occurred on multiple versions of JDK 8  - I updated to JDK 11 just to get better safepoint diagnostics. I have been hunting this problem for weeks now, and was hoping somebody could provide some help
(suggestions, tools, diagnostic params, tracing tips, anything!)

Thanks,
Ross

---
The current state of the system is:

Running Ubuntu JDK 11 + debug symbols:
  openjdk version "10.0.2" 2018-07-17
  OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3)
  OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode)

|
/usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xmx4G-Xms4G\
-server -XX:+UseG1GC-XX:MaxGCPauseMillis=20\
-XX:InitiatingHeapOccupancyPercent=35\
-XX:+ExplicitGCInvokesConcurrent\
-Djava.awt.headless=true\
-XX:+PreserveFramePointer\
-XX:+UnlockDiagnosticVMOptions\
-XX:+DebugNonSafepoints\
-Xlog:safepoint=trace:safepoint.log:utctime,tags:filecount=20,filesize=102400000\
-XX:+UseCountedLoopSafepoints\
-XX:LoopStripMiningIter=10\
-XX:-Inline\
...

|

Description:    Ubuntu 18.04.1 LTS
Release:    18.04

BOOT_IMAGE=/vmlinuz-4.15.0-36-generic intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll nomodeset net.ifnames=0

Motherboard: SuperMicro X10DRU-i+_R1.02b
Backplane: SuperMicro BPN-SAS-815TQ
Processor: 2.1GHz Intel Xeon-Broadwell (E5-2620-V4)
Firmware Version 3.1 6-8-2018

I have disabled everything power related that I could find in BIOS
(following https://www.thomas-krenn.com/en/wiki/Disable_CPU_Power_Saving_Management_in_BIOS)

$ cat /sys/module/intel_idle/parameters/max_cstate
0
$ cat /sys/devices/system/cpu/cpuidle/current_driver
none
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance


--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to