[
https://issues.apache.org/jira/browse/ZOOKEEPER-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
maoling updated ZOOKEEPER-3608:
-------------------------------
Description:
Users may confuse about these two variables:*acceptedEpoch and currentEpoch*
introduced by this ticket.
The implementation up to version 3.3.3 has not included epoch variables
*acceptedEpoch and currentEpoch*. This omission has generated problems in a
production version and was noticed by many ZooKeeper clients.
− *acceptedEpoch*: the epoch number of the last *NEWEPOCH* message accepted;
− *currentEpoch*: the epoch number of the last *NEWLEADER* message accepted;
The origin of this problem is at the beginning of *Recovery* Phase, when the
leader increments its epoch (contained in *lastZxid*) even before acquiring a
quorum of successfully connected followers (such leader is called *false
leader*). Since a follower goes back to *FLE* if its epoch is larger than the
leader’s epoch, when a *false leader* drops leadership and becomes a follower
of a leader from a previous epoch, it finds a smaller epoch and goes back to
FLE. This behavior can loop, switching from *Recovery* Phase to *FLE*
repeatedly.
Consequently, using *lastZxid* to store the epoch number, there is no
distinction between a *tried* epoch and a *joined* epoch in the implementation.
Those are the respective purposes for *acceptedEpoch and currentEpoch*, hence
the omission of them render such problems.
More details can be found in this report paper: _*ZooKeeper’s atomic broadcast
protocol: Theory and practice. Andr ́e Medeiros March 20, 2012*_
> add a documentation about currentEpoch and acceptEpoch
> ------------------------------------------------------
>
> Key: ZOOKEEPER-3608
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3608
> Project: ZooKeeper
> Issue Type: Improvement
> Components: documentation, leaderElection, server
> Reporter: maoling
> Assignee: maoling
> Priority: Minor
>
> Users may confuse about these two variables:*acceptedEpoch and currentEpoch*
> introduced by this ticket.
> The implementation up to version 3.3.3 has not included epoch variables
> *acceptedEpoch and currentEpoch*. This omission has generated problems in a
> production version and was noticed by many ZooKeeper clients.
> − *acceptedEpoch*: the epoch number of the last *NEWEPOCH* message accepted;
> − *currentEpoch*: the epoch number of the last *NEWLEADER* message accepted;
> The origin of this problem is at the beginning of *Recovery* Phase, when the
> leader increments its epoch (contained in *lastZxid*) even before acquiring a
> quorum of successfully connected followers (such leader is called *false
> leader*). Since a follower goes back to *FLE* if its epoch is larger than the
> leader’s epoch, when a *false leader* drops leadership and becomes a follower
> of a leader from a previous epoch, it finds a smaller epoch and goes back to
> FLE. This behavior can loop, switching from *Recovery* Phase to *FLE*
> repeatedly.
> Consequently, using *lastZxid* to store the epoch number, there is no
> distinction between a *tried* epoch and a *joined* epoch in the
> implementation. Those are the respective purposes for *acceptedEpoch and
> currentEpoch*, hence the omission of them render such problems.
> More details can be found in this report paper: _*ZooKeeper’s atomic
> broadcast protocol: Theory and practice. Andr ́e Medeiros March 20, 2012*_
--
This message was sent by Atlassian Jira
(v8.3.4#803005)