Matan created ZOOKEEPER-2081:
--------------------------------

             Summary: Leader election cannot complete when a node is blackholed 
(unreachable) even when quorum is possible.
                 Key: ZOOKEEPER-2081
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2081
             Project: ZooKeeper
          Issue Type: Bug
          Components: leaderElection, quorum
    Affects Versions: 3.4.6, 3.3.6
         Environment: Verified on RHEL and Mac OS X.
            Reporter: Matan


I noticed a situation when one of our 3-node clusters on RHEL lost a machine 
due to PSU failure. The remaining two nodes failed to complete leader election 
and would continually restart the leader election process.
Restarting the nodes would not help and they would reach the same exact state.

This was curious so I spent some time and managed to reproduce this on my local 
machine and found what looks like the main factor:
When a node is unreachable (timeouts), this somehow causes the election process 
to get out of sync.  Once a leader is decided, the follower tries to connect to 
the leader only when the leader is not listening.
Then the follower gives up and the process starts again ad infinitum.

How to reproduce on a local machine:

1. Setup up a 3 node cluster of ZK.  Note we only need to set up 2 boxes since 
we'll just make the third unreachable:

MyId 1:

server.1=MyMachine:2881:3881
server.2=<Put any IP that we can block>:2882:3882
server.3=MyMachine:2883:3883

MyId 3:

server.1=MyMachine:2881:3881
server.2=<Put any IP that we can block>:2882:3882
server.3=MyMachine:2883:3883

Now set up a blackhole route for the IP you choose (Mac OSX, Linux is similar):
> route add -host <IP you selected> 127.0.0.1 -blackhole

Start your 2 nodes.  They will never reach quorum.

However, if I remove the blackhole route and just not start the 3rd instance 
(but the host is still reachable), it will work fine and quorum will be reached 
almost immediately.

It seems the difference between the “timeout” and a "connection refused” makes 
all the difference somehow in the election process.

I verified this behavior on 3.4.6 and 3.3.6.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to