[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fangmin Lv updated ZOOKEEPER-3109:
----------------------------------
    Description: 
Occasionally, we'll find it takes long time to elect a leader, might longer 
then 1 minute, depends on how big the initLimit and tickTime are set.
  
 This exposes an issue in leader election protocol. During leader election, 
before the voter goes to the LEADING/FOLLOWING state, it will wait for a 
finalizeWait time before changing its state. Depends on the order of 
notifications, some voter might change mind just after it voting for a server. 
If the server it was previous voting for has majority of votes after 
considering this one, then that server will goto LEADING state. In some corner 
cases, the leader may end up with timeout waiting for epoch ACK from majority, 
because of the changed mind voter. This usually happen when there are even 
number of servers in the ensemble (either because one of the server is down or 
being restarted and it takes long time to restart). If there are 5 servers in 
the ensemble, then we'll find two of them in LEADING/FOLLOWING state, another 
two in LOOKING state, but the LOOKING servers cannot join the quorum since 
they're waiting for majority servers FOLLOWING the current leader before 
changing to FOLLOWING as well.
  
 As far as we know, this voter will change mind if it received a vote from 
another host which just started and start to vote itself, or there is a server 
takes long time to shutdown it's previous ZK server and start to vote itself 
when starting the leader election process.
  
 Also the follower may abandon the leader if the leader is not ready for 
accepting learner connection when the follower tried to connect to it.
  
 To solve this issue, there are multiple options: 

1. increase the finalizeWait time

2. smartly detect this state on leader and quit earlier

 
 The 1st option is straightforward and easier to change, but it will cause 
longer leader election time in common cases.
  
 The 2nd option is more complexity, but it can efficiently solve the problem 
without sacrificing the performance in common cases. It remembers the first 
majority servers voting for it, checking if there is anyone changed mind while 
it's waiting for epoch ACK. The leader will wait for sometime before quitting 
LEADING state, since one voter changed may not be a problem if there are still 
majority voters voting for it.

  was:
Occasionally, we'll find it takes long time to elect a leader, might longer 
then 1 minute, depends on how big the initLimit and tickTime are set.
 
This exposes an issue in leader election protocol. During leader election, 
before the voter goes to the LEADING/FOLLOWING state, it will wait for a 
finalizeWait time before changing its state. Depends on the order of 
notifications, some voter might change mind just after it voting for a server. 
If the server it was previous voting for has majority of votes after 
considering this one, then that server will goto LEADING state. In some corner 
cases, the leader may end up with timeout waiting for epoch ACK from majority, 
because of the changed mind voter. This usually happen when there are even 
number of servers in the ensemble (either because one of the server is down or 
being restarted and it takes long time to restart). If there are 5 servers in 
the ensemble, then we'll find two of them in LEADING/FOLLOWING state, another 
two in LOOKING state, but the LOOKING servers cannot join the quorum since 
they're waiting for majority servers FOLLOWING the current leader before 
changing to FOLLOWING as well.
 
As far as we know, this voter will change mind if it received a vote from 
another host which just started and start to vote itself, or there is a server 
takes long time to shutdown it's previous ZK server and start to vote itself 
when starting the leader election process.
 
Also the follower may abandon the leader if the leader is not ready for 
accepting learner connection when the follower tried to connect to it.
 
To solve this issue, there are multiple options: # 
increase the finalizeWait time
 # 
smartly detect this state on leader and quit earlier

 
The 1st option is straightforward and easier to change, but it will cause 
longer leader election time in common cases.
 
The 2nd option is more complexity, but it can efficiently solve the problem 
without sacrificing the performance in common cases. It remembers the first 
majority servers voting for it, checking if there is anyone changed mind while 
it's waiting for epoch ACK. The leader will wait for sometime before quitting 
LEADING state, since one voter changed may not be a problem if there are still 
majority voters voting for it.


> Avoid long unavailable time due to voter changed mind when activating the 
> leader during election
> ------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3109
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: quorum, server
>    Affects Versions: 3.6.0
>            Reporter: Fangmin Lv
>            Assignee: Fangmin Lv
>            Priority: Major
>             Fix For: 3.6.0
>
>
> Occasionally, we'll find it takes long time to elect a leader, might longer 
> then 1 minute, depends on how big the initLimit and tickTime are set.
>   
>  This exposes an issue in leader election protocol. During leader election, 
> before the voter goes to the LEADING/FOLLOWING state, it will wait for a 
> finalizeWait time before changing its state. Depends on the order of 
> notifications, some voter might change mind just after it voting for a 
> server. If the server it was previous voting for has majority of votes after 
> considering this one, then that server will goto LEADING state. In some 
> corner cases, the leader may end up with timeout waiting for epoch ACK from 
> majority, because of the changed mind voter. This usually happen when there 
> are even number of servers in the ensemble (either because one of the server 
> is down or being restarted and it takes long time to restart). If there are 5 
> servers in the ensemble, then we'll find two of them in LEADING/FOLLOWING 
> state, another two in LOOKING state, but the LOOKING servers cannot join the 
> quorum since they're waiting for majority servers FOLLOWING the current 
> leader before changing to FOLLOWING as well.
>   
>  As far as we know, this voter will change mind if it received a vote from 
> another host which just started and start to vote itself, or there is a 
> server takes long time to shutdown it's previous ZK server and start to vote 
> itself when starting the leader election process.
>   
>  Also the follower may abandon the leader if the leader is not ready for 
> accepting learner connection when the follower tried to connect to it.
>   
>  To solve this issue, there are multiple options: 
> 1. increase the finalizeWait time
> 2. smartly detect this state on leader and quit earlier
>  
>  The 1st option is straightforward and easier to change, but it will cause 
> longer leader election time in common cases.
>   
>  The 2nd option is more complexity, but it can efficiently solve the problem 
> without sacrificing the performance in common cases. It remembers the first 
> majority servers voting for it, checking if there is anyone changed mind 
> while it's waiting for epoch ACK. The leader will wait for sometime before 
> quitting LEADING state, since one voter changed may not be a problem if there 
> are still majority voters voting for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to