[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
------------------------------
    Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of [ZOOKEEPER-3642|https://issues.apache.org/jira/browse/ZOOKEEPER-3642], 
Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}

 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}

Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)


> Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
> syncProcessor, which may lead to potential data inconsistency
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4712
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4712
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.5.10, 3.6.3, 3.7.0, 3.8.0, 3.7.1, 3.6.4, 3.8.1
>            Reporter: Sirius
>            Priority: Critical
>
> Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
> syncProcessor. It may lead to potential data inconsistency (see Potential 
> Risk).
>  
> A follower / observer will invoke syncProcessor.shutdown() in 
> LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
> respectively.
> However, after the 
> [fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
>  of [ZOOKEEPER-3642|https://issues.apache.org/jira/browse/ZOOKEEPER-3642], 
> Follower.shutdown() / Observer.shutdown() will not invoke 
> LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() 
> anymore.
>  
> h4. Call stack
> h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
>  * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
> ->ZooKeeperServer.shutdown(boolean)
>  * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
> ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)
>  
> h5. For comparison, in version 3.4.X,
>  * Observer.shutdown() -> Learner.shutdown() -> 
> {*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * Follower.shutdown() -> Learner.shutdown() -> 
> {*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  
> h4. Code Details
> Take version 3.8.0 as an example.
> In Follower.shutdown() :
> {code:java}
>     public void shutdown() {
>         LOG.info("shutdown Follower");
> +       // invoke Learner.shutdown()
>         super.shutdown();   
>     } {code}
>  
> In Learner.java:
> {code:java}
>     public void shutdown() {
>         ...
>         // shutdown previous zookeeper
>         if (zk != null) {
>             // If we haven't finished SNAP sync, force fully shutdown
>             // to avoid potential inconsistency
> +           // This will invoke ZooKeeperServer.shutdown(boolean), 
> +           // which will not shutdown syncProcessor
> +           // Before the fix of ZOOLEEPER-3642, 
> +           // FollowerZooKeeperServer.shutdown() will be invoked here
>             zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP)); 
>          }
>     } {code}
>  
> In ZooKeeperServer.java:
> {code:java}
>     public synchronized void shutdown(boolean fullyShutDown) {
>         ...
>         if (firstProcessor != null) {
> +           // For a follower, this will not shutdown its syncProcessor.
>             firstProcessor.shutdown(); 
>         }
>         ...
>     } {code}
>  
> In expectation, Follower.shutdown() should invoke 
> LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
> {code:java}
>     public synchronized void shutdown() {
>         ...
>         try {
> +           // shutdown the syncProcessor here
>             if (syncProcessor != null) {
>                 syncProcessor.shutdown();     
>             }
>         } ...
>     } {code}
> Observer.shutdown() has the similar problem.
>  
> h4. Potential Risk
> When Follower.shutdown() is called, the follower's QuorumPeer thread may 
> update its lastProcessedZxid for the election before its syncThread drains 
> the pending requests and flushes them to disk.
> In consequence, this lastProcessedZxid is not the latest zxid in its log, 
> leading to log inconsistency after the SYNC phase. (Similar to the symptoms 
> of ZOOKEEPER-2845.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to