[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangyq000 updated ZOOKEEPER-2959:
----------------------------------
    Description: 
Once the ZooKeeper cluster finishes the election for new leader, all learners 
report their accepted epoch to the leader for the computation of new cluster 
epoch.

org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
{code:java}
    private final HashSet<Long> connectingFollowers = new HashSet<Long>();
    public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
InterruptedException, IOException {
        synchronized(connectingFollowers) {
            if (!waitingForNewEpoch) {
                return epoch;
            }
            if (lastAcceptedEpoch >= epoch) {
                epoch = lastAcceptedEpoch+1;
            }
            connectingFollowers.add(sid);
            QuorumVerifier verifier = self.getQuorumVerifier();
            if (connectingFollowers.contains(self.getId()) &&
                                            
verifier.containsQuorum(connectingFollowers)) {
                waitingForNewEpoch = false;
                self.setAcceptedEpoch(epoch);
                connectingFollowers.notifyAll();
            } else {
                long start = Time.currentElapsedTime();
                long cur = start;
                long end = start + self.getInitLimit()*self.getTickTime();
                while(waitingForNewEpoch && cur < end) {
                    connectingFollowers.wait(end - cur);
                    cur = Time.currentElapsedTime();
                }
                if (waitingForNewEpoch) {
                    throw new InterruptedException("Timeout while waiting for 
epoch from quorum");
                }
            }
            return epoch;
        }
    }
{code}

The computation will get an outcome once :
# The leader has call method "getEpochToPropose"
# The number of all reporters is greater than half of participants.

The problem is, an observer server will also send its accepted epoch to the 
leader, while this procedure treat observers as participants.

Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and 
now the leader and the observer have reported their accepted epochs while 
neither of the followers has. Thus, the connectingFollowers set consists of two 
elements, resulting in a size of 2, which is greater than half quorum, namely, 
2. Then QuorumVerifier#containsQuorum will return true, because it does not 
check whether the elements of the parameter is a participant.


  was:
Once the ZooKeeper cluster finishes the election for new leader, all learners 
report their accepted epoch to the leader for the computation of new cluster 
epoch.

org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
{code:java}
    private final HashSet<Long> connectingFollowers = new HashSet<Long>();
    public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
InterruptedException, IOException {
        synchronized(connectingFollowers) {
            if (!waitingForNewEpoch) {
                return epoch;
            }
            if (lastAcceptedEpoch >= epoch) {
                epoch = lastAcceptedEpoch+1;
            }
            connectingFollowers.add(sid);
            QuorumVerifier verifier = self.getQuorumVerifier();
            if (connectingFollowers.contains(self.getId()) &&
                                            
verifier.containsQuorum(connectingFollowers)) {
                waitingForNewEpoch = false;
                self.setAcceptedEpoch(epoch);
                connectingFollowers.notifyAll();
            } else {
                long start = Time.currentElapsedTime();
                long cur = start;
                long end = start + self.getInitLimit()*self.getTickTime();
                while(waitingForNewEpoch && cur < end) {
                    connectingFollowers.wait(end - cur);
                    cur = Time.currentElapsedTime();
                }
                if (waitingForNewEpoch) {
                    throw new InterruptedException("Timeout while waiting for 
epoch from quorum");
                }
            }
            return epoch;
        }
    }
{code}

The computation will get an outcome once :
# The leader has call method 
# The number of all reporters is greater than half quorum, i.e., half of 
PARTICIPANTS.

The problem is, an observer server is not a PARTICIPANT, while this procedure 
treat observers as participants.

Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and 
now the leader and the observer have reported their epoch while neither of the 
followers has. Thus, the connectingFollowers set consists of two elements, 
resulting in a size of 2, which is greater than half quorum, namely, 2. So the 
if condition is met.

This procedure can be confusing. 
# The  connectingFollowers set can contain elements of SID of observers. (In 
fact, at least it must contain the SID of the leader).
# The intent of QuorumVerifier#containsQuorum is to check whether a set of 
PARTICIPANTS makes a quorum. However, here it just regards a set of peers as a 
set of participants.

There are 2 candidate solutions.
# Ignore epoch from observers.
# require (number_of_reported_peers > number_of_all_peers / 2) , instead of 
existing (number_of_reported_peers > number_of_all_participants / 2).

The similar confusion exists in the following procedure when the leader counts 
the ACKs for the new epoch from learners.


> ignore epoch proposal and ack from observers when a newly elected leader 
> computes new epoch
> -------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2959
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.11
>            Reporter: xiangyq000
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
>     private final HashSet<Long> connectingFollowers = new HashSet<Long>();
>     public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
>         synchronized(connectingFollowers) {
>             if (!waitingForNewEpoch) {
>                 return epoch;
>             }
>             if (lastAcceptedEpoch >= epoch) {
>                 epoch = lastAcceptedEpoch+1;
>             }
>             connectingFollowers.add(sid);
>             QuorumVerifier verifier = self.getQuorumVerifier();
>             if (connectingFollowers.contains(self.getId()) &&
>                                             
> verifier.containsQuorum(connectingFollowers)) {
>                 waitingForNewEpoch = false;
>                 self.setAcceptedEpoch(epoch);
>                 connectingFollowers.notifyAll();
>             } else {
>                 long start = Time.currentElapsedTime();
>                 long cur = start;
>                 long end = start + self.getInitLimit()*self.getTickTime();
>                 while(waitingForNewEpoch && cur < end) {
>                     connectingFollowers.wait(end - cur);
>                     cur = Time.currentElapsedTime();
>                 }
>                 if (waitingForNewEpoch) {
>                     throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
>                 }
>             }
>             return epoch;
>         }
>     }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter is a participant.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to