[
https://issues.apache.org/jira/browse/KAFKA-7349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mickael Maison resolved KAFKA-7349.
-----------------------------------
Resolution: Won't Fix
We are now removing ZooKeeper support so closing this issue.
> Long Disk Writes cause Zookeeper Disconnects
> --------------------------------------------
>
> Key: KAFKA-7349
> URL: https://issues.apache.org/jira/browse/KAFKA-7349
> Project: Kafka
> Issue Type: Bug
> Components: zkclient
> Affects Versions: 0.11.0.1
> Reporter: Adam Kafka
> Priority: Minor
> Attachments: SpikeInWriteTime.png
>
>
> We run our Kafka cluster on a cloud service provider. As a consequence, we
> notice a large tail latency write time that is out of our control. Some
> writes take on the order of seconds. We have noticed that often these long
> write times are correlated with subsequent Zookeeper disconnects from the
> brokers. It appears that during the long write time, the Zookeeper heartbeat
> thread does not get scheduled CPU time, resulting in a long gap of heartbeats
> sent. After the write, the ZK thread does get scheduled CPU time, but it
> detects that it has not received a heartbeat from Zookeeper in a while, so it
> drops its connection then rejoins the cluster.
> Note that the timeout reported is inconsistent with the timeout as set by the
> configuration ({{zookeeper.session.timeout.ms}} = default of 6 seconds). We
> have seen a range of values reported here, including 5950ms (less than
> threshold), 12032ms (double the threshold), 25999ms (much larger than the
> threshold).
> We noticed that during a service degradation of the storage service of our
> cloud provider, these Zookeeper disconnects increased drastically in
> frequency.
> We are hoping there is a way to decouple these components. Do you agree with
> our diagnosis that the ZK disconnects are occurring due to thread contention
> caused by long disk writes? Perhaps the ZK thread could be scheduled at a
> higher priority? Do you have any suggestions for how to avoid the ZK
> disconnects?
> Here is an example of one of these events:
> Logs on the Broker:
> {code}
> [2018-08-25 04:10:19,695] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:21,697] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:23,700] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:25,701] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:27,702] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:29,704] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:31,707] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:33,709] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:35,712] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:37,714] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:39,716] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:41,719] DEBUG Got ping response for sessionid:
> 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
> ...
> [2018-08-25 04:10:53,752] WARN Client session timed out, have not heard from
> server in 12032ms for sessionid 0x36202ab4337002c
> (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:53,754] INFO Client session timed out, have not heard from
> server in 12032ms for sessionid 0x36202ab4337002c, closing socket connection
> and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> [2018-08-25 04:10:53,920] INFO zookeeper state changed (Disconnected)
> (org.I0Itec.zkclient.ZkClient)
> [2018-08-25 04:10:53,920] INFO Waiting for keeper state SyncConnected
> (org.I0Itec.zkclient.ZkClient)
> ...
> {code}
> GC logs during the same time (demonstrating this is not just a long GC):
> {code}
> 2018-08-25T04:10:36.434+0000: 35150.779: [GC (Allocation Failure)
> 3074119K->2529089K(6223360K), 0.0137342 secs]
> 2018-08-25T04:10:37.367+0000: 35151.713: [GC (Allocation Failure)
> 3074433K->2528524K(6223360K), 0.0127938 secs]
> 2018-08-25T04:10:38.274+0000: 35152.620: [GC (Allocation Failure)
> 3073868K->2528357K(6223360K), 0.0131040 secs]
> 2018-08-25T04:10:39.220+0000: 35153.566: [GC (Allocation Failure)
> 3073701K->2528885K(6223360K), 0.0133247 secs]
> 2018-08-25T04:10:40.175+0000: 35154.520: [GC (Allocation Failure)
> 3074229K->2528639K(6223360K), 0.0127870 secs]
> 2018-08-25T04:10:41.084+0000: 35155.429: [GC (Allocation Failure)
> 3073983K->2530769K(6223360K), 0.0135058 secs]
> 2018-08-25T04:10:42.012+0000: 35156.358: [GC (Allocation Failure)
> 3076113K->2531772K(6223360K), 0.0165919 secs]
> 2018-08-25T04:10:53.737+0000: 35168.083: [GC (Allocation Failure)
> 3077116K->2529630K(6223360K), 0.0135440 secs]
> 2018-08-25T04:10:58.515+0000: 35172.860: [GC (Allocation Failure)
> 3074974K->2531144K(6223360K), 0.0164601 secs]
> 2018-08-25T04:11:03.149+0000: 35177.495: [GC (Allocation Failure)
> 3076488K->2531751K(6223360K), 0.0146978 secs]
> {code}
> The attached screenshot shows a spike in write time to our data volume at the
> same second. Note that this an average value, so it does not represent the
> max write time, only a sample of write times.
> !SpikeInWriteTime.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)