Ian Babrou created ZOOKEEPER-1515:
-------------------------------------
Summary: Long reconnect timeout if leader failed.
Key: ZOOKEEPER-1515
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1515
Project: ZooKeeper
Issue Type: Bug
Components: leaderElection, quorum, server
Affects Versions: 3.3.5
Environment: Gentoo linux, but every environment is affected.
Reporter: Ian Babrou
In zookeeper 3.3.5 in file
src/java/main/org/apache/zookeeper/server/quorum/Learner.java:325 you may see
Thread.sleep(1000);
This is always happens after leader failure or restart. Zookeeper reelects new
leader and all followers try to connect to it. But first attempt always fails
because of "Connection refused":
{quote}
2012-07-23 18:55:48,159 - WARN [QuorumPeer:/0.0.0.0:2181:Learner@229] -
Unexpected exception, tries=0, connecting to web329.local/192.168.1.74:2888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at
org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:221)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:65)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
{quote}
I propose to change this line to the next code:
{quote}
if (tries > 0) {
Thread.sleep(self.tickTime);
}
{quote}
This way first reconnect attempt will be done immediately, other will wait for
tick time (this is good semantic change, I suppose).
The result of this change - leader reelection time lowered from >1500ms to
300-400ms with 50ms tick time. This is pretty important for our production
environment and will not break any existing installations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira