[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maoling updated ZOOKEEPER-3608:
-------------------------------
    Description: 
Users may confuse about these two variables:*acceptedEpoch and currentEpoch* 
introduced by this ticket.

The implementation up to version 3.3.3 has not included epoch variables 
*acceptedEpoch and currentEpoch*. This omission has generated problems in a 
production version and was noticed by many ZooKeeper clients.

− *acceptedEpoch*: the epoch number of the last *NEWEPOCH* message accepted;
− *currentEpoch*: the epoch number of the last *NEWLEADER* message accepted;

The origin of this problem is at the beginning of *Recovery* Phase, when the 
leader increments its epoch (contained in *lastZxid*) even before acquiring a 
quorum of successfully connected followers (such leader is called *false 
leader*). Since a follower goes back to *FLE* if its epoch is larger than the 
leader’s epoch, when a *false leader* drops leadership and becomes a follower 
of a leader from a previous epoch, it finds a smaller epoch and goes back to 
FLE. This behavior can loop, switching from *Recovery* Phase to *FLE* 
repeatedly.
Consequently, using *lastZxid* to store the epoch number, there is no 
distinction between a *tried* epoch and a *joined* epoch in the implementation. 
Those are the respective purposes for *acceptedEpoch and currentEpoch*, hence 
the omission of them render such problems.

More details can be found in this report paper: _*ZooKeeper’s atomic broadcast 
protocol: Theory and practice. Andr ́e Medeiros March 20, 2012*_

> add a documentation about currentEpoch and acceptEpoch
> ------------------------------------------------------
>
>                 Key: ZOOKEEPER-3608
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3608
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: documentation, leaderElection, server
>            Reporter: maoling
>            Assignee: maoling
>            Priority: Minor
>
> Users may confuse about these two variables:*acceptedEpoch and currentEpoch* 
> introduced by this ticket.
> The implementation up to version 3.3.3 has not included epoch variables 
> *acceptedEpoch and currentEpoch*. This omission has generated problems in a 
> production version and was noticed by many ZooKeeper clients.
> − *acceptedEpoch*: the epoch number of the last *NEWEPOCH* message accepted;
> − *currentEpoch*: the epoch number of the last *NEWLEADER* message accepted;
> The origin of this problem is at the beginning of *Recovery* Phase, when the 
> leader increments its epoch (contained in *lastZxid*) even before acquiring a 
> quorum of successfully connected followers (such leader is called *false 
> leader*). Since a follower goes back to *FLE* if its epoch is larger than the 
> leader’s epoch, when a *false leader* drops leadership and becomes a follower 
> of a leader from a previous epoch, it finds a smaller epoch and goes back to 
> FLE. This behavior can loop, switching from *Recovery* Phase to *FLE* 
> repeatedly.
> Consequently, using *lastZxid* to store the epoch number, there is no 
> distinction between a *tried* epoch and a *joined* epoch in the 
> implementation. Those are the respective purposes for *acceptedEpoch and 
> currentEpoch*, hence the omission of them render such problems.
> More details can be found in this report paper: _*ZooKeeper’s atomic 
> broadcast protocol: Theory and practice. Andr ́e Medeiros March 20, 2012*_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to