Jeffrey F. Lukman commented on ZOOKEEPER-2865:

Hi [~shralex],

Actually we saw that there is an UPTODATE message from server 4 to server 3
after server 4 became the leader of the cluster.
But since server 3 has not initialized its port to receive message from server 4
(and server 5), server 3 failed to receive the UPTODATE message and 
therefore, it failed permanently to recover from this isolated situation.

Server 3 failed to start the port to receive message from server 4 and server 5
because it still has not known the latest configuration from server 2 (previous 
It only has server 1, 2 and 3 in its own configuration.

> Reconfig Causes Inconsistent Configuration file among the nodes
> ---------------------------------------------------------------
>                 Key: ZOOKEEPER-2865
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2865
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection, quorum, server
>    Affects Versions: 3.5.3
>            Reporter: Jeffrey F. Lukman
>         Attachments: ZK-2865.pdf
> When we run our Distributed system Model Checking (DMCK) in ZooKeeper v3.5.3
> by following the workload in ZK-2778:
> - initially start 2 ZooKeeper nodes
> - start 3 new nodes
> - do a reconfiguration (the complete reconfiguration is attached in the 
> document)
> We think our DMCK found this following bug:
> - while one of the just joined nodes has not received the latest 
> configuration update 
> (called as node X), the initial leader node closed its port, 
> therefore causing the node X to be isolated.
> For complete information of the bug, please see the document that is attached.

This message was sent by Atlassian JIRA

Reply via email to