[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

Vishal K (JIRA) Fri, 25 Mar 2011 11:26:48 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011330#comment-13011330
 ]


Vishal K commented on ZOOKEEPER-975:
------------------------------------

{quote}   1. It does not have a test, but I'm not entirely convinced we should 
try to implement one, since it might be complex. We should think about it, 
though;
{quote}

Yes, I didn't include a test. We should have a test, but I don't think I can 
get to it in the next couple of days or so. I was thinking of feeding the 
recvQueue with dummy Notifications for scenarios that we wanted to test and 
then call FastLeaderElection.lookForLeader() and verify the outcome. The tricky 
part is to come-up with the right sequence of Notifications to test all corner 
cases.

{quote}
   2. I don't understand why you're removing the outofelection data structure
. I believe the notifications from peers that are FOLLOWING/LEADING should be 
treated separately from the the ones of peers that are LOOKING. If a peer 
obtains notifications from a quorum saying that the peers are 
LEADING/FOLLOWING, then it should follow the leader they point to. If a peer 
obtains notifications from a quorum of FOLLOWING peers, then it should follow 
the protocol to select a leader. Consequently, the notifications should be 
treated separately.
{quote}

I was sure about that change either. I don't mind reintroducing outofelection. 
But I couldn't think of scenarios where inserting Notification in recvset 
instead of outofelection will be a problem. Considering that termPredicate 
verifies that majority vote for same <epoch, sid, zxid>, can you describe a 
scenario where this change could cause a problem?

> new peer goes in LEADING state even if ensemble is online
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-975
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.2
>            Reporter: Vishal K
>            Assignee: Vishal K
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740                             //If have received from all nodes, then 
> terminate
> 741                             if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742                                     
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743                                 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744                                         ServerState.LEADING: 
> learningState());
> 745                                 leaveInstance();
> 746                                 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748                             } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752                             if (termPredicate(recvset,
> 753                                     new Vote(proposedLeader, proposedZxid,
> 754                                             logicalclock))) {
> 755 
> 756                                 // Verify if there is any change in the 
> proposed leader
> 757                                 while((n = recvqueue.poll(finalizeWait,
> 758                                         TimeUnit.MILLISECONDS)) != null){
> 759                                     if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760                                             proposedLeader, 
> proposedZxid)){
> 761                                         recvqueue.put(n);
> 762                                         break;
> 763                                     }
> 764                                 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

Reply via email to