[jira] [Commented] (ZOOKEEPER-3466) ZK cluster converges, but does not properly handle client connections (new in 3.5.5)

Antoine Tran (Jira) Wed, 12 Aug 2020 06:35:42 -0700


    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176339#comment-17176339
 ]


Antoine Tran commented on ZOOKEEPER-3466:
-----------------------------------------

Hi, we have the exact same symptoms. We can reproduce a few time but only after 
an hour or so. Our env:
 * physical machines or clouds
 * redhat 7.7
 * Rancher 2.4.3 RKE kubernetes 1.17.5
 * zookeeper 3.6.1 deployed as static HA cluster with 3 nodes, as Kubernetes 
statefulset

Everything works until sometime, then it looks like the leader is unable to 
work, and the followers can't do anything.

Some logs:

 
{code:java}
[rancher@dl2psto-s00 infrastructure]$ kl2 exec -it zookeeper-cl-0 -- sh -c 
"echo srvr | nc localhost 2181"
Zookeeper version: 3.6.1--104dcb3e3fb464b30c5186d229e00af9f332524b, built on 
04/21/2020 15:01 GMT
Latency min/avg/max: 0/0.3019/12
Received: 15962
Sent: 7162
Connections: 11
Outstanding: 8799
Zxid: 0x200000000
Mode: leader
Node count: 256718
Proposal sizes last/min/max: 48/36/41704
[rancher@dl2psto-s00 infrastructure]$ kl2 exec -it zookeeper-cl-1 -- sh -c 
"echo srvr | nc localhost 2181"
Zookeeper version: 3.6.1--104dcb3e3fb464b30c5186d229e00af9f332524b, built on 
04/21/2020 15:01 GMT
Latency min/avg/max: 0/0.6582/9
Received: 9125
Sent: 620
Connections: 6
Outstanding: 8416
Zxid: 0x10037dee1
Mode: follower
Node count: 256718
[rancher@dl2psto-s00 infrastructure]$ kl2 exec -it zookeeper-cl-2 -- sh -c 
"echo srvr | nc localhost 2181"
Zookeeper version: 3.6.1--104dcb3e3fb464b30c5186d229e00af9f332524b, built on 
04/21/2020 15:01 GMT
Latency min/avg/max: 0/0.25/9
Received: 17539
Sent: 9039
Connections: 11
Outstanding: 8406
Zxid: 0x10037dee1
Mode: follower
Node count: 256718{code}
 

I notice the leader (always) have
{code:java}
Zxid: 0x200000000{code}
Tcp is OK but
{code:java}
zkCli.sh ls /{code}
responds in timeout whatever the server.

 

Zookeeper configuration:

 
{code:java}
syncLimit=5
dataDir=/space/ZookeeperData/datadir
clientPort=2181
maxClientCnxns=6000
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=zookeeper-cl-0.zk-leader:2888:3888
server.2=0.0.0.0:2888:3888
server.3=zookeeper-cl-2.zk-leader:2888:3888
{code}
 

One particularity of our deployment is Zookeeper datadir is in a tmpfs. Then we 
thought we reached some hard limit (memory hard limit of 6GB) but our memory 
consumption seems ok:

 
{code:java}
[rancher@dl2psto-s00 infrastructure]$ kl2 exec -it zookeeper-cl-1 -- df -h | 
grep datadir
tmpfs 126G 585M 126G 1% /space/ZookeeperData/datadir
docker stats (3 nodes): ~687.3MiB / 6GiB
{code}
 

 

 

 

 

 

 

> ZK cluster converges, but does not properly handle client connections (new in 
> 3.5.5)
> ------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3466
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3466
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.5
>         Environment: Linux
>            Reporter: Jan-Philip Gehrcke
>            Priority: Major
>
> Hey, we explore switching from ZooKeeper 3.4.14 to ZooKeeper 3.5.5 in 
> [https://github.com/dcos/dcos].
> DC/OS coordinates ZooKeeper via Exhibitor. We are not changing anything 
> w.r.t. Exhibitor for now, and are hoping that we can use ZooKeeper 3.5.5 as a 
> drop-in replacement for 3.4.14. This seems to work fine when Exhibitor uses a 
> so-called static ensemble where the individual ZooKeeper instances are known 
> a priori.
> When Exhibitor however discovers individual ZooKeeper instances ("dynamic" 
> back-end) then I think we observe a regression where ZooKeeper 3.5.5 can get 
> into the following bad state (often, but not always):
>  # three ZooKeeper instances find each other, leader election takes place 
> (*expected*)
>  # leader election succeeds: two followers, one leader (*expected*)
>  # all three ZK instances respond IAMOK to RUOK  (*expected*)
>  # all three ZK instances respond to SRVR (one says "Mode: leader", the other 
> two say "Mode: follower")  (*expected*)
>  # all three ZK instances respond to MNTR and show plausible output 
> (*expected*)
>  # *{color:#ff0000}Unexpected:{color}* any ZooKeeper client trying to connect 
> to any of the three nodes observes a "connection timeout", whereas notably 
> this is *not* a TCP connect() timeout. The TCP connect() succeeds, but then 
> ZK does not seem to send the expected byte sequence to the TCP connection, 
> and the ZK client waits for it via recv() until it hits a timeout condition. 
> Examples for two different clients:
>  ## In Kazoo we specifically hit _Connection time-out: socket time-out during 
> read_
>  generated here: 
> [https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/protocol/connection.py#L249]
>  ## In zkCli we see  _Client session timed out, have not heard from server in 
> 15003ms for sessionid 0x0, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn:main-SendThread(localhost:2181))_
>  # This state is stable, it will last forever (well, at least for multiple 
> hours and we didn't test longer than that).
>  # In our system the ZooKeeper clients are crash-looping. They retry. What I 
> have observed is that while they retry the ZK ensemble accumulates 
> outstanding requests, here shown from MNTR output (emphasis mine): 
>  zk_packets_received 2008
>  zk_packets_sent 127
>  zk_num_alive_connections 18
>  zk_outstanding_requests *1880*
>  # The leader emits log lines confirming session timeout, example:
>  _[myid:3] INFO [SessionTracker:ZooKeeperServer@398] - Expiring session 
> 0x2000642b18f0020, timeout of 10000ms exceeded [myid:3] INFO 
> [SessionTracker:QuorumZooKeeperServer@157] - Submitting global closeSession 
> request for session 0x2000642b18f0020_
>  # In this state, restarting any one of the two ZK followers results in the 
> same state (clients don't get data from ZK upon connect).
>  # In this state, restarting the ZK leader, and therefore triggering a leader 
> re-election, almost immediately results in all clients being able to connect 
> to all ZK instances successfully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ZOOKEEPER-3466) ZK cluster converges, but does not properly handle client connections (new in 3.5.5)

Reply via email to