[
https://issues.apache.org/jira/browse/ZOOKEEPER-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew January resolved ZOOKEEPER-3149.
---------------------------------------
Resolution: Duplicate
> Unreachable node can prevent remaining nodes from gaining quorum
> ----------------------------------------------------------------
>
> Key: ZOOKEEPER-3149
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3149
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection
> Affects Versions: 3.4.12
> Reporter: Andrew January
> Priority: Minor
>
> Steps to reproduce:
> # Have a 3 node cluster set up, with node 2 as the leader, and node 3 zxid
> ahead of node 1 such that node 3 will be the new leader when node 2
> disappears.
> # Shut down node 2 such that it is unreachable and attempts to connect to it
> yield a socket timeout.
> # Have the remaining two nodes get "Connection refused" responses almost
> immediately if one tries to connect to the other on a port that isn't open.
> Expected behaviour:
> The remaining nodes reach quorum.
> Actual behaviour:
> The remaining nodes repeatedly fail to reach quorum, spinning and holding
> elections until node 2 is brought back.
>
> This is because:
> # An election for a new leader starts.
> # Both nodes broadcast notifications to all the other nodes
> # The notifications are sent to node 1 quickly, then it tries to send it to
> node 2, which takes cnxTimeout (default 5s) before timing out, then sends it
> to node 3. This results in all the notifications to node 3 taking 5 seconds
> to arrive.
> # Despite the delays, node 1 and node 3 agree that node 3 should be leader.
> # node 1 sends the message that it will follow node 3, then immediately
> tries to connect to it as leader.
> # Because of the delay, node 3 hasn't yet received the notification that
> node 1 is following it, so doesn't start accepting requests.
> # This causes the requests from node 1 to fail quickly with "Connection
> refused".
> # It retries 5 times (pausing a second between each)
> # Because these connection refused are happening at 1/5th of cnxTimeout,
> node 1 gives up trying to follow node 3 and starts a new election.
> # Node 3 times out waiting for node 1 to acknowledge it as leader, and
> starts a new election.
>
> We can work around the issue by decreasing cnxTimeout to be less than 5.
> However, it seems like a bad idea to rely on tweaking a value based on
> network performance, especially as the value is only configurable via JVM
> args rather than the conf files.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)