chenbo created ZOOKEEPER-2776:
---------------------------------
Summary: Election Failed when (medium id) node down
Key: ZOOKEEPER-2776
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2776
Project: ZooKeeper
Issue Type: Bug
Components: leaderElection
Affects Versions: 3.4.6
Reporter: chenbo
We found a bug of zookeeper election when used in our environment. It could be
simply reproduced in 3 nodes cluster with default settings.
# Assume zookeeper services down on all nodes and node 3 has bigger zxid than
node1. this makes node 3 a potential leader.
# Make node 2 down (or drop all incoming packages by firewall).
# Start zookeeper services on node 1 and node 3.
Zookeeper cluster cannot be successfully established in such a case. The
following logs could be found and verified:
# Notifications to node 2 always times out.
# node 3 is always leading but always failed because (Timeout while waiting for
epoch from quorum). It rarely get Follower during the period.
# node 1 is always following but always failed to connect Leader. it gives up
after tried for 5 times and then another round election started again and again.
# the time node 3 decided to be a leader is 1s after node 1 giving up
contacting it.
# node 3 always receive Notification packages 5s after node 1.
Then we analyzed source code of zookeeper-3.4.6 and found:
# In election, Zookeeper send leader election message sequentially and has
connection timeout 5s by default. This makes a 5s recv delay for nodes after
(by id) the down node. Those nodes will get the same election notification 5s
after those nodes which have smaller id than the down node.
In the case mentioned above, node 3 realized the situation and jumped into
LEADING status 5s after node 1 decided to follow it. For follower node 1, it
tried to connect leader 5 attempts with 1s interval (hard-coded). This means
all followers give up connecting leader after 4s. At the time when follower
gave up, the node 3 has not even become the leader.
-- So, Is there any solution to configure or bypass this problem?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)