[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141235#comment-13141235
 ] 

Camille Fournier commented on ZOOKEEPER-1264:
---------------------------------------------

>From a comment I added to the tracker that this change was attached to:
ZOOKEEPER-1136 causes a concurrency bug. Specifically:
1. Follower rejoins, gets snap from leader
2. Follower gets NEWLEADER message and takes a snapshot
3. Follower gets some additional tranactions forwarded from leader, applies 
these directly to data tree
4. Follower gets an UPTODATE message, does not take a snapshot
5. Follower starts following, writes some new transactions to its log, and is 
killed before it takes another snapshot
6. Follower restarts and gets a DIFF from the leader

The transactions that came in between NEWLEADER and UPTODATE are lost because 
they never go anywhere but the internal data tree, and if that tree isn't 
snapshotted and the follower restarts with only a DIFF, the follower will lose 
these transactions.

I think the proper thing to do is snapshot after UPTODATE, but I'm not sure why 
we changed this to snapshot after NEWLEADER instead. The wiki doesn't seem to 
explain that clearly. 
                
> FollowerResyncConcurrencyTest failing intermittently
> ----------------------------------------------------
>
>                 Key: ZOOKEEPER-1264
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: tests
>    Affects Versions: 3.3.3, 3.4.0, 3.5.0
>            Reporter: Patrick Hunt
>            Assignee: Camille Fournier
>            Priority: Blocker
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>        at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>        at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>        at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to