[
https://issues.apache.org/jira/browse/ZOOKEEPER-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ZOOKEEPER-3642:
--------------------------------------
Labels: pull-request-available (was: )
> Data inconsistency when the leader crashes right after sending SNAP sync
> ------------------------------------------------------------------------
>
> Key: ZOOKEEPER-3642
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3642
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.6.0, 3.5.5, 3.5.6, 3.7.0
> Environment: Linux 4.19.29 x86_64
> Reporter: Alex Mirgorodskiy
> Assignee: Fangmin Lv
> Priority: Major
> Labels: pull-request-available
>
> If the leader crashes after sending a SNAP sync to a learner, but before
> sending the NEWLEADER message, the learner will not save the snapshot to
> disk. But it will advance its lastProcessedZxid to that from the snapshot
> (call it Zxid X)
> A new leader will get elected, and it will resync our learner again
> immediately. But this time, it will use the incremental DIFF method, starting
> from Zxid X. A DIFF-based resync does not trigger snapshots, so the learner
> is still holding the original snapshot purely in memory. If the learner
> restarts after that, it will silently lose all the data up to Zxid X.
> An easy way to reproduce is to insert System.exit into LearnerHandler.java
> right before sending the NEWLEADER message (on the one instance that is
> currently running the leader, but not the others):
> {noformat}
> LOG.debug("Sending NEWLEADER message to " + sid);
> + if (leader.self.getId() == 1 && sid == 3) {
> + LOG.debug("Bail when server.1 resyncs server.3");
> + System.exit(0);
> + }
> {noformat}
> If I remember right, the repro steps are as follows. Run with that patch in a
> 4-instance ensemble where server.3 is an Observer, the rest are voting
> members, and server.1 is the current Leader. Start server.3 after the other
> instances are up. It will get the initial snapshot from server.1 and server.1
> will stop immediately because of the patch. Say, server.2 takes over as the
> new Leader. Server.3 will receive a Diff resync from server.2, but will skip
> persisting the snapshot. A subsequent restart of server.3 will make that
> instance come up with a blank data tree.
> The above steps assumed that server.3 is an Observer, but it can presumably
> happen for voting members too. Just need a 5-instance ensemble.
> Our workaround is to take the snapshot unconditionally on receiving NEWLEADER:
> {noformat}
> - if (snapshotNeeded) {
> + // Take the snapshot unconditionally. The first leader
> may have crashed
> + // after sending us a SNAP, but before sending NEWLEADER.
> The second leader will
> + // send us a DIFF, and we'd still like to take a
> snapshot, even though
> + // the upstream code used to skip it.
> + if (true || snapshotNeeded) {
> zk.takeSnapshot();
> }
> {noformat}
> This is what 3.4.x series used to do. But I assume it is not the ideal fix,
> since it essentially disables the "snapshotNeeded" optimization.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)