[
https://issues.apache.org/jira/browse/ZOOKEEPER-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264368#comment-14264368
]
Hadoop QA commented on ZOOKEEPER-1865:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12654667/ZOOKEEPER-1865.patch
against trunk revision 1646992.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac
compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3)
warnings.
+1 release audit. The applied patch does not increase the total number of
release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2475//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2475//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2475//console
This message is automatically generated.
> Fix retry logic in Learner.connectToLeader()
> ---------------------------------------------
>
> Key: ZOOKEEPER-1865
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1865
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Reporter: Thawan Kooburat
> Assignee: Edward Carter
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-1865.patch
>
>
> We discovered a long leader election time today in one of our prod ensemble.
> Here is the description of the event.
> Before the old leader goes down, it is able to announce notification message.
> So 3 out 5 (including the old leader) elected the old leader to be a new
> leader for the next epoch. While, the old leader is being rebooted, 2 other
> machines are trying to connect to the old leader. So the quorum couldn't
> form until those 2 machines give up and move to the next round of leader
> election.
> This is because Learner.connectToLeader() use a simple retry logic. The
> contract for this method is that it should never spend longer that initLimit
> trying to connect to the leader. In our outage, each sock.connect() is
> probably blocked for initLimit and it is called 5 times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)