[jira] [Commented] (ZOOKEEPER-1678) Server fails to join quorum when a peer is unreachable (5 ZK server setup)

JL (JIRA) Thu, 28 Mar 2013 17:49:16 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616914#comment-13616914
 ]


JL commented on ZOOKEEPER-1678:
-------------------------------

This probably means that "WorkSender" is blocked for that long in connect() (in 
connectOne), and thus cannot contact enough peers (and get responses) before 
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader() times 
out, which is capped ({{maxNotificationInterval}}) at 60 secs.

In theory, the connect should take no longer that tickTime * syncLimit, which 
in this case are set to: {{tickTime=2000}}, {{initLimit=10}}
                
> Server fails to join quorum when a peer is unreachable (5 ZK server setup)
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1678
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1678
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection
>    Affects Versions: 3.4.5
>         Environment: java version "1.6.0_32"
> Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode)
> Distributor ID:       Ubuntu
> Description:  Ubuntu 12.04.1 LTS
> Release:      12.04
> Codename:     precise
> uname -a Linux ha-vani3-0 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 
> UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: JL
>
> In a 5-node ZK cluster setup, in the following state:
> * 1 host is down / not reachable.
> * 4 hosts are up.
> * 3 ZK servers are in quorum.
> * a 4th ZK server was restarted and is trying to re-join the quorum.
> The 4th server is not able to rejoin the quorum because the connection to the 
> host that is not established, and apparently takes to long to timeout.
> Stack traces and additional information coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1678) Server fails to join quorum when a peer is unreachable (5 ZK server setup)

Reply via email to