[
https://issues.apache.org/jira/browse/ZOOKEEPER-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568533#comment-16568533
]
Hudson commented on ZOOKEEPER-3104:
-----------------------------------
FAILURE: Integrated in Jenkins build ZooKeeper-trunk #133 (See
[https://builds.apache.org/job/ZooKeeper-trunk/133/])
ZOOKEEPER-3104: Fix data inconsistency due to NEWLEADER being sent too (breed:
rev 148c2cd6ba73e66b1879a2e10ecda4ce4e0e2c7b)
* (edit)
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java
* (edit) src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
* (edit) src/java/main/org/apache/zookeeper/server/quorum/Leader.java
> Potential data inconsistency due to NEWLEADER packet being sent too early
> during SNAP sync
> ------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-3104
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3104
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.4, 3.6.0, 3.4.13
> Reporter: Fangmin Lv
> Assignee: Fangmin Lv
> Priority: Critical
> Labels: pull-request-available
> Fix For: 3.6.0
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> Currently, in SNAP sync, the leader will start queuing the proposal/commits
> and the NEWLEADER packet before sending over the snapshot over wire. So it's
> possible that the zxid associated with the snapshot might be higher than all
> the packets queued before NEWLEADER.
>
> When the follower received the snapshot, it will apply all the txns queued
> before NEWLEADER, which may not cover all the txns up to the zxid in the
> snapshot. After that, it will write the snapshot out to disk with the zxid
> associated with the snapshot. In case the server crashed after writing this
> out, when loading the data from disk, it will use zxid of the snapshot file
> to sync with leader, and it could cause data inconsistent, because we only
> replayed partial of the historical data during previous syncing.
>
> NEWLEADER packet means the learner now has the correct and almost up to data
> state as leader, so it makes more sense to move the NEWLEADER packet after
> sending over snapshot, and this is what we did in the fix.
>
> Besides this, the socket timeout is changed to use smaller sync timeout after
> received NEWLEADER ack, in high write traffic ensembles with large snapshot,
> the follower might be timed out by leader before finishing sending over those
> queued txns after writing snapshot out, which could cause the follower
> staying in syncing state forever. Move the NEWLEADER packet after sending
> over snapshot can avoid this issue as well.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)