Karolos Antoniadis created ZOOKEEPER-3537:
---------------------------------------------
Summary: Leader election - Use of out of election messages
Key: ZOOKEEPER-3537
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3537
Project: ZooKeeper
Issue Type: Improvement
Reporter: Karolos Antoniadis
Assignee: Karolos Antoniadis
Hello ZooKeeper developers,
in {{lookForLeader}} in {{FastLeaderElection}} there is the following switch
block in case a notification message {{n}} is received where {{n.state}} is
either {{FOLLOWING}} or {{LEADING}}
([https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L1029]).
{code:java}
case FOLLOWING:
case LEADING:
/*
* Consider all notifications from the same epoch
* together.
*/
if (n.electionEpoch == logicalclock.get()) {
recvset.put(n.sid, new Vote(n.leader, n.zxid, n.electionEpoch,
n.peerEpoch));
voteSet = getVoteTracker(recvset, new Vote(n.version, n.leader, n.zxid,
n.electionEpoch, n.peerEpoch, n.state));
if (voteSet.hasAllQuorums() && checkLeader(outofelection, n.leader,
n.electionEpoch)) {
setPeerState(n.leader, voteSet);
Vote endVote = new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch);
leaveInstance(endVote);
return endVote;
}
}
/*
* Before joining an established ensemble, verify that
* a majority are following the same leader.
*/
outofelection.put(n.sid, new Vote(n.version, n.leader, n.zxid,
n.electionEpoch, n.peerEpoch, n.state));
voteSet = getVoteTracker(outofelection, new Vote(n.version, n.leader, n.zxid,
n.electionEpoch, n.peerEpoch, n.state));
if (voteSet.hasAllQuorums() && checkLeader(outofelection, n.leader,
n.electionEpoch)) {
synchronized (this) {
logicalclock.set(n.electionEpoch);
setPeerState(n.leader, voteSet);
}
Vote endVote = new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch);
leaveInstance(endVote);
return endVote;
}
break;{code}
We notice that when {{n.electionEpoch == logicalclock.get()}}, votes are being
added into {{recvset}}, however {{checkLeader}} is called immediately
afterwards with the votes in {{outofelection}} as can be seen here
([https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L1037]).
Checking {{outofelection}} instead of {{recvset}} does not cause any problems.
If {{checkLeader}} on {{outofelection}} fails, although it would have succeeded
in {{recvset}}, {{checkLeader}} succeeds immediately afterwards when the vote
is added in {{outofelection}}.
Still, it seems natural to check for a leader in {{recvSet}} and not in
{{outofelection}}.
Cheers,
Karolos
--
This message was sent by Atlassian Jira
(v8.3.2#803003)