[ https://issues.apache.org/jira/browse/ZOOKEEPER-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576482#comment-16576482 ]
Andor Molnar commented on ZOOKEEPER-3104: ----------------------------------------- [~breed] [~lvfangmin] Given that this is a critical bug in 3.4 and 3.5 why have you committed to trunk only? > Potential data inconsistency due to NEWLEADER packet being sent too early > during SNAP sync > ------------------------------------------------------------------------------------------ > > Key: ZOOKEEPER-3104 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3104 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.4, 3.6.0, 3.4.13 > Reporter: Fangmin Lv > Assignee: Fangmin Lv > Priority: Critical > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, in SNAP sync, the leader will start queuing the proposal/commits > and the NEWLEADER packet before sending over the snapshot over wire. So it's > possible that the zxid associated with the snapshot might be higher than all > the packets queued before NEWLEADER. > > When the follower received the snapshot, it will apply all the txns queued > before NEWLEADER, which may not cover all the txns up to the zxid in the > snapshot. After that, it will write the snapshot out to disk with the zxid > associated with the snapshot. In case the server crashed after writing this > out, when loading the data from disk, it will use zxid of the snapshot file > to sync with leader, and it could cause data inconsistent, because we only > replayed partial of the historical data during previous syncing. > > NEWLEADER packet means the learner now has the correct and almost up to data > state as leader, so it makes more sense to move the NEWLEADER packet after > sending over snapshot, and this is what we did in the fix. > > Besides this, the socket timeout is changed to use smaller sync timeout after > received NEWLEADER ack, in high write traffic ensembles with large snapshot, > the follower might be timed out by leader before finishing sending over those > queued txns after writing snapshot out, which could cause the follower > staying in syncing state forever. Move the NEWLEADER packet after sending > over snapshot can avoid this issue as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)