Sirius created ZOOKEEPER-4712: --------------------------------- Summary: Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency Key: ZOOKEEPER-4712 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4712 Project: ZooKeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.8.1, 3.7.1, 3.8.0, 3.7.0, 3.6.3, 3.5.10, 3.6.4 Reporter: Sirius
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor. It may lead to potential data inconsistency (see Potential Risk). A follower / observer will invoke syncProcessor.shutdown() in LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), respectively. However, after the [fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22] of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore. h4. Call stack h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ... * *(Buggy)* Observer.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean) * *(Buggy)* Follower.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean) * (For comparison) Leader.shutdown(String)->LeaderZooKeeper.shutdown()->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean) h5. For comparison, in version 3.4.X, * Observer.shutdown()->Learner.shutdown()->*ObserverZooKeeperServer.shutdown()*->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean) * Follower.shutdown()->Learner.shutdown()->*FollowerZooKeeperServer.shutdown()*>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean) h4. Code Details Take version 3.8.0 as an example. In Follower.shutdown() : public void shutdown() { LOG.info("shutdown Follower"); + // invoke Learner.shutdown() super.shutdown(); } In Learner.java: public void shutdown() { ... // shutdown previous zookeeper if (zk != null) { // If we haven't finished SNAP sync, force fully shutdown // to avoid potential inconsistency + // This will invoke ZooKeeperServer.shutdown(boolean), + // which will not shutdown syncProcessor + // Before the fix of ZOOLEEPER-3642, + // FollowerZooKeeperServer.shutdown() will be invoked here zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP)); } } In ZooKeeperServer.java: public synchronized void shutdown(boolean fullyShutDown) { ... if (firstProcessor != null) { + // For a follower, this will not shutdown its syncProcessor. firstProcessor.shutdown(); } ... } In expectation, Follower.shutdown() should invoke LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor: public synchronized void shutdown() { ... try { + // shutdown the syncProcessor here if (syncProcessor != null) { syncProcessor.shutdown(); } } ... } Observer.shutdown() has the similar problem. h4. Potential Risk When Follower.shutdown() is called, the follower's QuorumPeer thread may update its lastProcessedZxid for the election before its syncThread drains the pending requests and flushes them to disk. In consequence, this lastProcessedZxid is not the latest zxid in its log, leading to log inconsistency after the SYNC phase. (Similar to the symptoms of ZOOKEEPER-2845.) -- This message was sent by Atlassian Jira (v8.20.10#820010)