Fangmin Lv created ZOOKEEPER-3104:
-------------------------------------

             Summary: Potential data inconsistency due to NEWLEADER packet 
being sent too early during SNAP sync
                 Key: ZOOKEEPER-3104
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3104
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.4.13, 3.5.4, 3.6.0
            Reporter: Fangmin Lv
            Assignee: Fangmin Lv


Currently, in SNAP sync, the leader will start queuing the proposal/commits and 
the NEWLEADER packet before sending over the snapshot over wire. So it's 
possible that the zxid associated with the snapshot might be higher than all 
the packets queued before NEWLEADER.
 
When the follower received the snapshot, it will apply all the txns queued 
before NEWLEADER, which may not cover all the txns up to the zxid in the 
snapshot. After that, it will write the snapshot out to disk with the zxid 
associated with the snapshot. In case the server crashed after writing this 
out, when loading the data from disk, it will use zxid of the snapshot file to 
sync with leader, and it could cause data inconsistent, because we only 
replayed partial of the historical data during previous syncing.
 
NEWLEADER packet means the learner now has the correct and almost up to data 
state as leader, so it makes more sense to move the NEWLEADER packet after 
sending over snapshot, and this is what we did in the fix.
 
Besides this, the socket timeout is changed to use smaller sync timeout after 
received NEWLEADER ack, in high write traffic ensembles with large snapshot, 
the follower might be timed out by leader before finishing sending over those 
queued txns after writing snapshot out, which could cause the follower staying 
in syncing state forever. Move the NEWLEADER packet after sending over snapshot 
can avoid this issue as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to