[
https://issues.apache.org/jira/browse/ZOOKEEPER-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718439#comment-16718439
]
John Kim commented on ZOOKEEPER-2461:
-------------------------------------
I think this and ZOOKEEPER-2555 are describing the same issue.
Anyway, we ran into what looks close to what [~nerdyyatrice] described (running
3.4.12), but with slight difference in order:
Observers took about 10-15ms less time to complete their shutdown than
Followers. The difference seems to be the times between when CommitProcessor is
shut down and when SyncRequestProcessor is shut down.
So around step 2, Followers have not yet completed shutdown and were sending
back election notification with n.zxid and n.leader that was used/set in the
previous election. One part I'm not sure of is that observer logs do not
mention any of the notifications participants sent after they started their own
elections, as they should've been within the 200ms window. Perhaps this is
where bug lies.
So observers try to follow the old leader, eventually timing out and completing
a new set of election.
> There is no difference between the observer and the participants in the
> leader election algorithm
> -------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2461
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2461
> Project: ZooKeeper
> Issue Type: Improvement
> Components: quorum
> Affects Versions: 3.5.0
> Reporter: Ryan Zhang
> Assignee: Ryan Zhang
> Priority: Major
> Fix For: 3.6.0, 3.5.5
>
>
> We have observed a case that when a leader machine crashes hard, non-voting
> learners take a long time to detect the new leader. After looking at the
> details more carefully, we identified one potential improvement (and one bug
> fixed in the 3.5).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)