[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281390#comment-15281390
 ] 

Flavio Junqueira commented on ZOOKEEPER-1549:
---------------------------------------------

I've done some further analysis of this issue after some time (it took me a 
while to recover all uncached state), and here is what I think needs to happen.

It is not incorrect that we take snapshots where we are taking. Snapshots are 
simply compacted versions of the txn log history. The real problem is that we 
are apparently truncating the txn log appropriately, but are loading a snapshot 
that does not reflect the state of the txn log. What needs to happen is that 
once we truncate, we need to make sure that the snapshot we load reflects the 
right txn log state, which we aren't doing and is causing the problem described 
here. The solution is to go back to an earlier snapshot and rebuild the state 
from that snapshot. [~fanster.z] actually proposed early on to delete the 
snapshot and that is sounding like a viable way to sort this out.

I'll let people chime in and otherwise and I'll mature the idea a bit longer 
and will start working on a patch.

> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1549
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.3
>            Reporter: Jacky007
>            Assignee: Flavio Junqueira
>            Priority: Blocker
>             Fix For: 3.5.2, 3.6.0
>
>         Attachments: ZOOKEEPER-1549-3.4.patch, ZOOKEEPER-1549-learner.patch, 
> case.patch
>
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
> not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1.    Lets say there are three nodes in the ensemble A,B,C with A being the 
> leader
> 2.    The current epoch is 7. 
> 3.    For simplicity of the example, lets say zxid is a two digit number, 
> with epoch being the first digit.
> 4.    The zxid is 73
> 5.    All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there 
> is a crash of the entire ensemble and B,C never write the change 74 to their 
> log.
> Step 2
> A,B restart, A is elected as the new leader,  and A will load data and take a 
> clean snapshot(change 74 is in it), then send diff to B, but B died before 
> sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71, 
> maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory 
> data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff. 
> Problem:
> The problem with the above sequence is that after truncate the log, A will 
> load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), 
> the leader will send a snapshot to follower, it will not be a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to