[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4712:
--------------------------------------
    Labels: pull-request-available  (was: )

> Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
> syncProcessor, which may lead to data inconsistency
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4712
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4712
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.5.10, 3.6.3, 3.7.0, 3.8.0, 3.7.1, 3.6.4, 3.8.1
>            Reporter: Sirius
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
> syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
> Risk{*}).
>  
> A follower / observer will invoke syncProcessor.shutdown() in 
> LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
> respectively.
> However, after the 
> [FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
>  of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
> LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() 
> anymore.
>  
> h2. Call stack
> h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
>  * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
> ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)
>  
> h5. For comparison, in version 3.4.X,
>  * Observer.shutdown() -> Learner.shutdown() -> 
> {*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * Follower.shutdown() -> Learner.shutdown() -> 
> {*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  
> h2. Code Details
> Take version 3.8.0 as an example.
> In Follower.shutdown() :
> {code:java}
>     public void shutdown() {
>         LOG.info("shutdown Follower");
> +       // invoke Learner.shutdown()
>         super.shutdown();   
>     } {code}
>  
> In Learner.java:
> {code:java}
>     public void shutdown() {
>         ...
>         // shutdown previous zookeeper
>         if (zk != null) {
>             // If we haven't finished SNAP sync, force fully shutdown
>             // to avoid potential inconsistency
> +           // This will invoke ZooKeeperServer.shutdown(boolean), 
> +           // which will not shutdown syncProcessor
> +           // Before the fix of ZOOLEEPER-3642, 
> +           // FollowerZooKeeperServer.shutdown() will be invoked here
>             zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP)); 
>          }
>     } {code}
>  
> In ZooKeeperServer.java:
> {code:java}
>     public synchronized void shutdown(boolean fullyShutDown) {
>         ...
>         if (firstProcessor != null) {
> +           // For a follower, this will not shutdown its syncProcessor.
>             firstProcessor.shutdown(); 
>         }
>         ...
>     } {code}
>  
> In expectation, Follower.shutdown() should invoke 
> LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
> {code:java}
>     public synchronized void shutdown() {
>         ...
>         try {
> +           // shutdown the syncProcessor here
>             if (syncProcessor != null) {
>                 syncProcessor.shutdown();     
>             }
>         } ...
>     } {code}
> Observer.shutdown() has the similar problem.
>  
> h2. Potential Risk
> When Follower.shutdown() is called, the follower's QuorumPeer thread may 
> update the lastProcessedZxid for the election and recovery phase before its 
> syncThread drains the pending requests and flushes them to disk.
> In consequence, this lastProcessedZxid is not the latest zxid in its log, 
> leading to log inconsistency after the SYNC phase. (Similar to the symptoms 
> of ZOOKEEPER-2845.)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to