[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611777#comment-16611777
 ] 

Andor Molnar commented on ZOOKEEPER-3104:
-----------------------------------------

Thanks [~lvfangmin]

We don't get too many (maybe not at all) data inconsistency reports with 3.4, 
so I would say fixing this in 3.5 additionally would be beneficial for us.

> Potential data inconsistency due to NEWLEADER packet being sent too early 
> during SNAP sync
> ------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3104
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3104
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.4, 3.6.0, 3.4.13
>            Reporter: Fangmin Lv
>            Assignee: Fangmin Lv
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.6.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, in SNAP sync, the leader will start queuing the proposal/commits 
> and the NEWLEADER packet before sending over the snapshot over wire. So it's 
> possible that the zxid associated with the snapshot might be higher than all 
> the packets queued before NEWLEADER.
>  
> When the follower received the snapshot, it will apply all the txns queued 
> before NEWLEADER, which may not cover all the txns up to the zxid in the 
> snapshot. After that, it will write the snapshot out to disk with the zxid 
> associated with the snapshot. In case the server crashed after writing this 
> out, when loading the data from disk, it will use zxid of the snapshot file 
> to sync with leader, and it could cause data inconsistent, because we only 
> replayed partial of the historical data during previous syncing.
>  
> NEWLEADER packet means the learner now has the correct and almost up to data 
> state as leader, so it makes more sense to move the NEWLEADER packet after 
> sending over snapshot, and this is what we did in the fix.
>  
> Besides this, the socket timeout is changed to use smaller sync timeout after 
> received NEWLEADER ack, in high write traffic ensembles with large snapshot, 
> the follower might be timed out by leader before finishing sending over those 
> queued txns after writing snapshot out, which could cause the follower 
> staying in syncing state forever. Move the NEWLEADER packet after sending 
> over snapshot can avoid this issue as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to