Sirius created ZOOKEEPER-4712:
---------------------------------

             Summary: Follower.shutdown() and Observer.shutdown() do not 
correctly shutdown the syncProcessor, which may lead to potential data 
inconsistency
                 Key: ZOOKEEPER-4712
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4712
             Project: ZooKeeper
          Issue Type: Bug
          Components: quorum, server
    Affects Versions: 3.8.1, 3.7.1, 3.8.0, 3.7.0, 3.6.3, 3.5.10, 3.6.4
            Reporter: Sirius


Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* 
Observer.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* 
Follower.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)

 * (For comparison) 
Leader.shutdown(String)->LeaderZooKeeper.shutdown()->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X, 
 * 
Observer.shutdown()->Learner.shutdown()->*ObserverZooKeeperServer.shutdown()*->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 * 
Follower.shutdown()->Learner.shutdown()->*FollowerZooKeeperServer.shutdown()*>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 

h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    }
In Learner.java:
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));  

        }
    }
In ZooKeeperServer.java:
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    }
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    }
Observer.shutdown() has the similar problem. 

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk. 

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to