[jira] [Updated] (ZOOKEEPER-2783) follower disconnects and cannot reconnect

Ben Sherman (JIRA) Fri, 12 May 2017 17:47:33 -0700

     [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ben Sherman updated ZOOKEEPER-2783:
-----------------------------------
    Attachment:     (was: fail3.log)

> follower disconnects and cannot reconnect
> -----------------------------------------
>
>                 Key: ZOOKEEPER-2783
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2783
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection
>    Affects Versions: 3.4.10
>         Environment: centos 7, AWS EC2
>            Reporter: Ben Sherman
>         Attachments: fail3.log, fail5.log
>
>
> We have a 5 node cluster running 3.4.10 we saw this in .8 and .9 as well), 
> and sometimes, a node gets a read timeout, drops all the connections and 
> tries to re-establish itself to the quorum.  It can usually do this in a few 
> seconds, but last night it took almost 15 minutes to reconnect.
> These are 5 servers in AWS, and we've tried tuning the timeouts, but the are 
> exceeding any reasonable timeout and still failing.
> In the attached logs, 5 is a follower, 3 is the leader.  5 loses connectivity 
> at 11:21:34.  3 sees the disconnect at the same moment.
> 5 tries to re-establish the quorum, but cannot do it until the connections to 
> the other servers expire at 11:37:02.  After the connections are 
> re-established, 5 connects immediately.
> At 11:41:08, the operator restarted the server, and it reconnected normally.
> I suspect there is a problem with stale connections to the rest of the quorum 
> - the other services on this box were fine (monitoring, puppet) and able to 
> establish new connections with no problems.
> I posed this problem to the zookeeper-users list and was asked to open a 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (ZOOKEEPER-2783) follower disconnects and cannot reconnect

Reply via email to