[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058718#comment-17058718
 ] 

Dai Shi commented on ZOOKEEPER-3756:
------------------------------------

Hi Mate,

Thanks for taking a look at this! I suspected it may have something to do with 
the IPs, so hoping you may be able to explain more clearly what might be 
happening :)

We are running zookeeper inside kubernetes as a stateful set. The IP 
100.126.116.201 is the actual IP of the pod that the leader, member 3, is 
running in. Unfortunately as far as I know you cannot set static IPs for pods 
in kubernetes, so that is why the IPs in the configs do not match the IP that 
zookeeper binds to since we cannot know in advance what the pod IP will be. We 
have quorumListenOnAllIPs=true in the config. The IPs in the config point to 
kubernetes services, which remain static, each of which select the single 
corresponding pod as the backend.

Just for reference, prior to upgrading to 3.5 we were using 0.0.0.0 in our 
configs, but only in a 3 member cluster. As far as I can tell, there are no 
issues with member rejoining the quorum with either the 3.4 config or 3.5 
config in a 3 member cluster. However, there is one major issue in both that if 
the leader pod is restarted, the cluster becomes hard down until the pod 
restarts. It seems that the leader election does not take place until the pod 
comes back and the zookeeper process starts, so even though the logs say leader 
election took 1ms, the cluster was down for maybe 30 to 60 seconds.

This is what led to me trying a 5 member cluster to see if that would trigger 
the leader election quicker, but it led to me discovering this behavior 
instead. I still don't understand why if I remove servers 4 and 5 from the 
config of servers 1 and 2 they can then successfully rejoin.

I've since shifted the cluster back down to 3 members, but if you think it 
would be helpful to get a log from all 5 members I can repeat the experiment 
again.

Thanks again for your help!

> Members failing to rejoin quorum
> --------------------------------
>
>                 Key: ZOOKEEPER-3756
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3756
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: leaderElection
>    Affects Versions: 3.5.6, 3.5.7
>            Reporter: Dai Shi
>            Assignee: Mate Szalay-Beko
>            Priority: Major
>
> Not sure if this is the place to ask, please close if it's not.
> I am seeing some behavior that I can't explain since upgrading to 3.5:
> In a 5 member quorum, when server 3 is the leader and each server has this in 
> their configuration: 
> {code:java}
> server.1=100.71.255.254:2888:3888:participant;2181
> server.2=100.71.255.253:2888:3888:participant;2181
> server.3=100.71.255.252:2888:3888:participant;2181
> server.4=100.71.255.251:2888:3888:participant;2181
> server.5=100.71.255.250:2888:3888:participant;2181{code}
> If servers 1 or 2 are restarted, they fail to rejoin the quorum with this in 
> the logs:
> {code:java}
> 2020-03-11 20:23:35,720 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):QuorumPeer@1175] - 
> LOOKING
> 2020-03-11 20:23:35,721 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@885]
>  - New election. My id =  2, proposed zxid=0x1b8005f4bba
> 2020-03-11 20:23:35,733 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (3, 2)
> 2020-03-11 20:23:35,734 [myid:2] - INFO  
> [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection 
> request 100.126.116.201:36140
> 2020-03-11 20:23:35,735 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (4, 2)
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (5, 2)
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection 
> request 100.126.116.201:36142
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message 
> format version), 2 (n.leader), 0x1b8005f4bba (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 2 (n.sid), 0x1b8 (n.peerEPoch), LOOKING (my state)0 (n.config 
> version)
> 2020-03-11 20:23:35,742 [myid:2] - WARN  
> [SendWorker:3:QuorumCnxManager$SendWorker@1143] - Interrupted while waiting 
> for message on queue
> java.lang.InterruptedException
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
>         at 
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
>         at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
>         at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
>         at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
> 2020-03-11 20:23:35,744 [myid:2] - WARN  
> [SendWorker:3:QuorumCnxManager$SendWorker@1153] - Send worker leaving thread  
> id 3 my id = 2
> 2020-03-11 20:23:35,745 [myid:2] - WARN  
> [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] - Interrupting 
> SendWorker{code}
> The only way I can seem to get them to rejoin the quorum is to restart the 
> leader.
> However, if I remove server 4 and 5 from the configuration of server 1 or 2 
> (so only servers 1, 2, and 3 remain in the configuration file), then they can 
> rejoin the quorum fine. Is this expected and am I doing something wrong? Any 
> help or explanation would be greatly appreciated. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to