[ https://issues.apache.org/jira/browse/ZOOKEEPER-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115308#comment-16115308 ]
Alexander Shraer commented on ZOOKEEPER-2865: --------------------------------------------- Thanks [~jeffreyflukman]! We don't try to guarantee that every member of the new config receives the proposal message of a reconfiguration (only a quorum needs to ack) and don't wait until either of them receive the COMMIT before completing the reconfig (to be compatible with other ZK operations, I didn't want to introduce another round of message exchange). But what's required is for the cluster to be able to recover from this state - the server that didn't get the commit in your scenario should find out about the new config and eventually join the cluster. If that doesn't happen then that potentially is a bug, but its not clear from the description here. > Reconfig Causes Inconsistent Configuration file among the nodes > --------------------------------------------------------------- > > Key: ZOOKEEPER-2865 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2865 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum, server > Affects Versions: 3.5.3 > Reporter: Jeffrey F. Lukman > Attachments: ZK-2865.pdf > > > When we run our Distributed system Model Checking (DMCK) in ZooKeeper v3.5.3 > by following the workload in ZK-2778: > - initially start 2 ZooKeeper nodes > - start 3 new nodes > - do a reconfiguration (the complete reconfiguration is attached in the > document) > We think our DMCK found this following bug: > - while one of the just joined nodes has not received the latest > configuration update > (called as node X), the initial leader node closed its port, > therefore causing the node X to be isolated. > For complete information of the bug, please see the document that is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)