[
https://issues.apache.org/jira/browse/ZOOKEEPER-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024809#comment-17024809
]
Suhas Dantkale edited comment on ZOOKEEPER-3707 at 1/28/20 2:06 AM:
--------------------------------------------------------------------
Scenario:
1. 5 node ensemble-(SID 1,2,3,4,5). 5 is the current Leader.
2. Test brings down 5's ZK process.
3. Leadership election begins. First each SID votes itself to be the leader as
expected.
4. SID 1 and SID 2 gets notification from SID 3 before they get Notification
from SID 4. They update their vote to propose 3 as the Leader as expected and
send notifications.
5. SID 3 receives the notification from 1, 2 and itself and its Election
predicate is successfully terminated and it goes to LEADING state, comes out of
FLE and goes to the next phase.
6. SID 2 meantime goes to FOLLOWING state , comes out of FLE and goes to the
next phase(NEWLEADER sending etc).
so far so good.
7. Meantime (somewhere after step 4) SID 1 receives notification from SID 4
and since SID 4 > SID 3(and zxid is same), SID 1 changes its mind and updates
its proposal - now to elect 4 as leader and sends notification.
8. SID 4 is electing itself as leader. And even though SID 2 and SID 3 are out
of election, the SID 4 can not get out of election because not enough number of
nodes are following 3. Only 1 is following 3.
9. SID 1 is also stuck in FLE like Sid 4.
So, in summary SID 1 and 4 are stuck in FLE(lookForLeader()) and SID 2 and SID
3 are stuck in the next phase because SID 3's NEWLEADER is not responded by the
quorum.
was (Author: suhas.dantkale):
Scenario:
1. 5 node ensemble-(SID 1,2,3,4,5). 5 is the current Leader.
2. Test brings down 5's ZK process.
3. Leadership election begins. First each SID votes itself to be the leader as
expected.
4. SID 1 and SID 2 gets notification from SID 3 before they get Notification
from SID 4. They update their vote to propose 3 as the Leader as expected and
send notifications.
5. SID 3 receives the notification from 1, 2 and itself and its Election
predicate is successfully terminated and it goes to LEADING state, comes out of
FLE and goes to the next phase.
6. SID 2 meantime goes to FOLLOWING state , comes out of FLE and goes to the
next phase(NEWLEADER sending etc).
so far so good.
7. Meantime (somewhere after step 4) SID 1 receives notification from SID 4
and since SID 4 > SID 3(and zxid is same), SID 1 changes its mind and updates
its proposal - now to elect 4 as leader and sends notification.
8. SID 4 is electing itself as leader. And even though SID 2 and SID 3 are out
of election, the SID 4 can not get out of election because not enough number of
nodes are following 3. Only 1 is following 3.
9. SID 1 is also stuck in FLE like Sid 4.
So, in summary SID 1 and 4 are stuck in FLE and SID 2 and SID 3 are stuck in
the next phase because SID 3's NEWLEADER is not responded by the quorum.
> Leadership Election gets stuck in 5 node ensemble
> -------------------------------------------------
>
> Key: ZOOKEEPER-3707
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3707
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection
> Affects Versions: 3.5.5
> Reporter: Suhas Dantkale
> Priority: Major
>
> Scenario:
> 1. 5 node ensemble-(SID 1,2,3,4,5). 5 is the current Leader.
> 2. Test brings down 5's ZK process.
> 3. Leadership election begins. First each SID votes itself to be the leader
> as expected.
> 4. SID 1 and SID 2 gets notification from SID 3 before they get Notification
> from SID 4. They update their vote to propose 3 as the Leader as expected and
> send notifications.
> 5. SID 3 receives the notification from 1, 2 and itself and its Election
> predicate is successfully terminated and it goes to LEADING state, comes out
> of FLE and goes to the next phase.
> 6. SID 2 meantime goes to FOLLOWING state , comes out of FLE and goes to the
> next phase(NEWLEADER sending etc).
> so far so good.
> 7. Meantime (somewhere after step 4) SID 1 receives notification from SID 4
> and since SID 4 > SID 3(and zxid is same), SID 1 changes its mind and updates
> its proposal - now to elect 4 as leader and sends notification.
> 8. SID 4 is electing itself as leader. And even though SID 2 and SID 3 are
> out of election, the SID 4 can not get out of election because not enough
> number of nodes are following 3. Only 1 is following 3.
> 9. SID 2 is also stuck in FLE like Sid 4.
> So, in summary SID 1 and 4 are stuck in FLE (lookForLeader()) and SID 2 and
> SID 3 are stuck in the next phase because SID 3's NEWLEADER is not responded
> by the quorum.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)