[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168260#comment-16168260
 ] 

Yicheng Fang edited comment on ZOOKEEPER-2899 at 9/15/17 5:54 PM:
------------------------------------------------------------------

We tried running 'ZxidRolloverTest' with different setups but failed to 
reproduce the issue, so we decided to use same hardware as production. The 
experiments below were using a 5-node ZK ensemble, with 
zookeeper.testingonly.initialZxid set to high enough value:

1. With tiny scripts using kazoo, spawn off client processes that each 
continuously randomly create  ZK nodes and set data on them, generating the 
same number of connections as production, while having another set of clients 
randomly read data from the nodes.
   - Result: ZXID overflowed. Leader election happen completed within 5 
seconds. Short burst of errors was seen from the client side but the clients 
recovered right after the election.

2. Set up a 85 node Kafka broker cluster, then trigger overflowing with the 
same method as in 1.
  - Result: same as 1. The Kafka brokers behaved normal.

3. Set up a test tool to generate ~100k/s messages, and as many consumers as 
needed to generate the 1500-per-node connection count, for the Kafka cluster. 
The consumers writes consumption offsets to ZK every 10ms. 
  - We noticed that after the ZXID overflowed for a couple of times, the whole 
system began acting weirdly - metrics from the brokers became sporadic, ISRs 
became flappy, metrics volume sent by Kafka dropped, etc. See attachment 
'message_in_per_sec.png', 'metric_volume.png', 'GC_metric.png' for screenshots.
  - From the 'srvr' stats, latency became '0/[>100]/[>200]', vs. in normal 
conditions '0/0/[<100]'. Profiling ZK revealed that it was because the ensemble 
received high QPS of write traffics (presumably from the Kafka consumers) such 
that the 'submittedRequests' queue in 'PrepRequestProcessor' of the leader was 
filled up, causing even the reads to have high latencies.
  - It looked to us that somehow by electing a new leader when overflowing 
caused the consumers to align, thus DDOSing the ensemble. However, we have not 
observed the same behavior after bouncing the leader process BEFORE the 
overflow. The ensemble should behave similarly in both cases since both call 
for new leader elections. One difference though, we noticed, was that in the 
overflow case the leader election port was left open so the downed leader would 
participate in the new round of leader election. Not sure if it's related but 
thought might be worth bringing up.
    


was (Author: eefangyicheng):
We tried running 'ZxidRolloverTest' with different setups but failed to 
reproduce the issue, so we decided to use same hardware as production. The 
experiments below were using a 5-node ZK ensemble, with 
zookeeper.testingonly.initialZxid set to high enough value:

1. With tiny scripts using kazoo, spawn off client processes that each 
continuously randomly create  ZK nodes and set data on them, generating the 
same number of connections as production, while having another set of clients 
randomly read data from the nodes.
   - Result: ZXID overflowed. Leader election happen completed within 5 
seconds. Short burst of errors was seen from the client side but the clients 
recovered right after the election.

2. Set up a 85 node Kafka broker cluster, then trigger overflowing with the 
same method as in 1.
  - Result: same as 1. The Kafka brokers behaved normal.

3. Set up a test tool to generate ~100k/s messages, and as many consumers as 
needed to generate the 1500-per-node connection count, for the Kafka cluster. 
The consumers writes consumption offsets to ZK every 10ms. 
  - We noticed that after the ZXID overflowed for a couple of times, the whole 
system began acting weirdly - metrics from the brokers became sporadic, ISRs 
became flappy, metrics volume sent by Kafka dropped, etc. See attachment 
'message_in_per_sec.png', 'metric_volume.png', 'GC_metric.png' for screenshots.
  - From the 'srvr' stats, latency became '0/{>100}/{>200}', vs. in normal 
conditions '0/0/{<100}'. Profiling ZK revealed that it was because the ensemble 
received high QPS of write traffics (presumably from the Kafka consumers) such 
that the 'submittedRequests' queue in 'PrepRequestProcessor' of the leader was 
filled up, causing even the reads to have high latencies.
  - It looked to us that somehow by electing a new leader when overflowing 
caused the consumers to align, thus DDOSing the ensemble. However, we have not 
observed the same behavior after bouncing the leader process BEFORE the 
overflow. The ensemble should behave similarly in both cases since both call 
for new leader elections. One difference though, we noticed, was that in the 
overflow case the leader election port was left open so the downed leader would 
participate in the new round of leader election. Not sure if it's related but 
thought might be worth bringing up.
    

> Zookeeper not receiving packets after ZXID overflows
> ----------------------------------------------------
>
>                 Key: ZOOKEEPER-2899
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2899
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection
>    Affects Versions: 3.4.5
>         Environment: 5 host ensemble, 1500+ client connections each, 300K+ 
> nodes
> OS: Ubuntu precise
> JAVA 7
> JuniperQFX510048T NIC, 10000Mb/s, ixgbe driver
> 6 core Intel(R)_Xeon(R)_CPU_E5-2620_v3_@_2.40GHz
> 4 HDD 600G each 
>            Reporter: Yicheng Fang
>         Attachments: GC_metric.png, image12.png, image13.png, 
> message_in_per_sec.png, metric_volume.png, zk_20170309_wo_noise.log
>
>
> ZK was used with Kafka (version 0.10.0) for coordination. We had a lot of 
> Kafka consumers writing  consumption offsets to ZK.
> We observed the issue two times within the last year. Each time after ZXID 
> overflowed, ZK was not receiving packets even though leader election looked 
> successful from the logs, and ZK servers were up. As a result, the whole 
> Kafka system came to a halt.
> As an attempt to reproduce (and hopefully fixing) the issue, I set up test ZK 
> and Kafka clusters and feed them with like-production test traffic. Though 
> not really able to reproduce the issue, I did see that the Kafka consumers, 
> which used ZK clients, essentially DOSed the ensemble, filling up the 
> `submittedRequests` in `PrepRequestProcessor`, causing even 100ms+ read 
> latencies.
> More details are included in the comments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to