[
https://issues.apache.org/jira/browse/ZOOKEEPER-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141234#comment-13141234
]
Camille Fournier commented on ZOOKEEPER-1136:
---------------------------------------------
This change causes a concurrency bug. Specifically:
1. Follower rejoins, gets snap from leader
2. Follower gets NEWLEADER message and takes a snapshot
3. Follower gets some additional tranactions forwarded from leader, applies
these directly to data tree
4. Follower gets an UPTODATE message, does not take a snapshot
5. Follower starts following, writes some new transactions to its log, and is
killed before it takes another snapshot
6. Follower restarts and gets a DIFF from the leader
The transactions that came in between NEWLEADER and UPTODATE are lost because
they never go anywhere but the internal data tree, and if that tree isn't
snapshotted and the follower restarts with only a DIFF, the follower will lose
these transactions.
I think the proper thing to do is snapshot after UPTODATE, but I'm not sure why
we changed this to snapshot after NEWLEADER instead. The wiki doesn't seem to
explain that clearly. If one of you could check on
https://issues.apache.org/jira/browse/ZOOKEEPER-1264 and let me know the
reasoning, that would be helpful.
> NEW_LEADER should be queued not sent to match the Zab 1.0 protocol on the
> twiki
> -------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1136
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1136
> Project: ZooKeeper
> Issue Type: Bug
> Reporter: Benjamin Reed
> Assignee: Benjamin Reed
> Priority: Blocker
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1136.patch, ZOOKEEPER-1136.patch,
> ZOOKEEPER-1136.patch
>
>
> the NEW_LEADER message was sent at the beginning of the sync phase in Zab
> pre1.0, but it must be at the end in Zab 1.0. if the protocol is 1.0 or
> greater we need to queue rather than send the packet.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira