[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-822:
---------------------------------------

    Attachment: ZOOKEEPER-822.patch

I believe the patch I'm attaching achieves the same goal and is even simpler, 
but I'd like to make sure it suits your needs, Vishal.

If you agree with the modifications, I can work on a test. I was also thinking 
that the 2-second timeout you used before is too low and I've raised to 5 
seconds. But, instead of trying to argue which value is ideal, I'd rather use a 
system property and use a default value of at least 5 seconds.

I also commit to redesigning QuorumCnxManager for either 3.4.0 or 4.0.0 to use 
a selector or some other approach we agree upon. I've been wanting to do it for 
a while anyway, and I actually thought there was a jira open for it... Maybe 
not, I can't find it right now. 

> Leader election taking a long time  to complete
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.2, 3.4.0
>
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, 
> test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, 
> ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1
>
>
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
> Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=192.168.4.12\:2888\:3888
> server.0=192.168.4.11\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=192.168.4.13\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster 
> after failing the leader. The leader was down anyways so logs from that node 
> shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to