[
https://issues.apache.org/jira/browse/ZOOKEEPER-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424388#comment-13424388
]
Flavio Junqueira commented on ZOOKEEPER-1514:
---------------------------------------------
Hi Henry, I believe I understand the point you're raising, so perhaps I'm not
making myself clear. Let me try to add more detail.
This null check:
{noformat}
if(listener != null){
listener.start();
} else {
LOG.error("Null listener when initializing cnx manager");
Assert.fail("Failed to create cnx manager");
}
{noformat}
appears in a number of places in the code, essentially every time we use the
listener. The first time it appeared was in
QuorumPeer.createElectionAlgorithm() due to findbugs warnings as I mentioned
before (ZOOKEEPER-407).
When we created a mock server for FLELostMessageTest, we simply copied that
part that starts a listener. Currently, it appears in at least a couple of
places, and if I remove from this patch, we should also remove from the other
places. But, removing it from the other parts of the code is not part of this
issue, so if you feel strongly about this change, I suggest we leave the patch
with this check in and discuss removing the null check in another jira so that
we make uniform changes across the code, not mixing the issues.
> FastLeaderElection - leader ignores the round information when joining a
> quorum
> -------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1514
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1514
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.3.4
> Reporter: Patrick Hunt
> Assignee: Flavio Junqueira
> Priority: Critical
> Fix For: 3.4.4, 3.5.0, 3.3.7
>
> Attachments: ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch,
> ZOOKEEPER-1514.patch
>
>
> In the following case we have a 3 server ensemble.
> Initially all is well, zk3 is the leader.
> However zk3 fails, restarts, and rejoins the quorum as the new leader (was
> the old leader, still the leader after re-election)
> The existing two followers, zk1 and zk2 rejoin the new quorum again as
> followers of zk3.
> zk1 then fails, the datadirectory is deleted (so it has no state whatsoever)
> and restarted. However zk1 can never rejoin the quorum (even after an hour).
> During this time zk2 and zk3 are serving properly.
> Later all three servers are later restarted and properly form a functional
> quourm.
> Here are some interesting log snippets. Nothing else of interest was seen in
> the logs during this time:
> zk3. This is where it becomes the leader after failing initially (as the
> leader). Notice the "round" is ahead of zk1 and zk2:
> {noformat}
> 2012-07-18 17:19:35,423 - INFO
> [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id = 3,
> Proposed zxid = 77309411648
> 2012-07-18 17:19:35,423 - INFO [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648
> (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
> 2012-07-18 17:19:35,424 - INFO [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 73014444480
> (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
> 2012-07-18 17:19:35,424 - INFO [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 73014444480
> (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state)
> 2012-07-18 17:19:35,424 - INFO [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] -
> LEADING
> {noformat}
> zk1 which won't come back. Notice that zk3 is reporting the round as 831,
> while zk2 thinks that the round is 832:
> {noformat}
> 2012-07-18 17:31:12,015 - INFO [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648
> (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
> 2012-07-18 17:31:12,016 - INFO [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 73014444480
> (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state)
> 2012-07-18 17:31:12,017 - INFO [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648
> (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
> 2012-07-18 17:31:15,219 - INFO
> [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out:
> 6400
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira