[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047345#comment-17047345
 ] 

Mate Szalay-Beko edited comment on ZOOKEEPER-2164 at 2/28/20 11:54 AM:
-----------------------------------------------------------------------

Strange... the client connections should not have anything to do to this fix, 
we only fix the leader election here. After restart / leader election success, 
can you connect to the cluster using e.g. ZooKeeper CLI? For me that worked. If 
that works, then I think it will be some other issue (e.g. related to Curator / 
NiFi potentially?). Even if the bug seems to be in the ZooKeeper client code, 
then I suggest to open a separate ticket for it, because this ticket is about 
the leader election. 

Can you check the client logs / ZooKeeper server logs? Maybe there are some 
hints why NiFi can not re-connect using Curator.

I am not familiar with Curator / NiFi unfortunately... but I am happy to check 
what is happening, if you can help me somehow to reproduce it.


was (Author: symat):
Strange... the client connections should not have anything to do to this fix, 
we only fix the leader election here. After restart / leader election success, 
can you connect to the cluster using e.g. ZooKeeper CLI? For me that worked. If 
that works, then I think it will be some other issue (e.g. related to Curator / 
NiFi potentially?)

Can you check the client logs / ZooKeeper server logs? Maybe there are some 
hints why NiFi can not re-connect using Curator.

I am not familiar with Curator / NiFi unfortunately... 

> fast leader election keeps failing
> ----------------------------------
>
>                 Key: ZOOKEEPER-2164
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection
>    Affects Versions: 3.4.5
>            Reporter: Michi Mutsuzaki
>            Assignee: Mate Szalay-Beko
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.7.0, 3.5.8
>
>          Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. 
> When I shut down 2, 1 and 3 keep going back to leader election. Here is what 
> seems to be happening.
> - Both 1 and 3 elect 3 as the leader.
> - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a 
> follower.
> - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't 
> timeout for 5 seconds: 
> https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346
> - By the time 3 receives votes, 1 has given up trying to connect to 3: 
> https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247
> I'm using 3.4.5, but it looks like this part of the code hasn't changed for a 
> while, so I'm guessing later versions have the same issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to