[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962338#comment-16962338
 ] 

Sebastian Schmitz commented on ZOOKEEPER-2164:
----------------------------------------------

Hello,

I have like a similar problem.... Updating from 3.4.14 to 3.5.5 or 3.5.6 fails 
during leader-election if I start nodes in order 3-2-1. If I start in 1-2-3 
it's fine.
The first upgrade I did in my Test-Environment was running fine. Probably 
because during rolling upgrade some were still running 3.4.14 while they got 
updated in order 1-2-3 to version 3.5.6. But last night during an change of 
configuration of another component the Zookeeper-Containers were redeployed and 
the problem appeared. 

I wanted to attach the logs here, but it doesn't work somehow. And don't want 
to paste them here as it's like 2000 lines of logs. So if someone wants to see 
both starts logged just ask me ;)

Best regards

Sebastian

> fast leader election keeps failing
> ----------------------------------
>
>                 Key: ZOOKEEPER-2164
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection
>    Affects Versions: 3.4.5
>            Reporter: Michi Mutsuzaki
>            Priority: Major
>             Fix For: 3.6.0, 3.5.7
>
>
> I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. 
> When I shut down 2, 1 and 3 keep going back to leader election. Here is what 
> seems to be happening.
> - Both 1 and 3 elect 3 as the leader.
> - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a 
> follower.
> - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't 
> timeout for 5 seconds: 
> https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346
> - By the time 3 receives votes, 1 has given up trying to connect to 3: 
> https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247
> I'm using 3.4.5, but it looks like this part of the code hasn't changed for a 
> while, so I'm guessing later versions have the same issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to