[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539015#comment-13539015
 ] 

Flavio Junqueira commented on ZOOKEEPER-1549:
---------------------------------------------

Two points that are critical here for the correctness of the protocol is that 
the state is persisted in the right order and that the follower changes its 
current epoch when it has accepted the state of the leader is proposing. It is 
fine (and necessary even) to persist the transactions that the follower 
accepts, but as we have discussed already, they can't be part of a snapshot if 
they haven't been committed. Note that it is ok for a follower to accept these 
transactions even if it loses its connection to the prospective leader.

[~thawan]
bq. In your proposal, the leader may tear down the quorum if syncLimit is not 
long enough to cover snapshotting time.

I'm not exactly sure of what part of the proposal you're referring to here, but 
if the follower does not receive a complete snapshot it should throw it away. 
In the case the snapshot is received fully but the subsequent diff isn't, it is 
still fine to persist the snapshot as long as the snapshot doesn't contain 
uncommitted state.

[~fournc]
bq. I wanted to try and look at this but the test case provided doesn't 
replicate for me in the 3.4 branch.

Interesting, it does reproduce for me on trunk, reliably.

bq. The devil will be in writing a reproducible test case for this madness.

There are possibly multiple ways of getting a dirty snapshot. I don't think a 
single test will cover all possibilities.  

bq. Is anyone working on doing that?
I have written no code so far, aside from doing a few tests with removing calls 
to take a snapshot here and there. I don't know if [~thawan] has written any 
code. 

[~thawan] are you interested in working on/providing a patch? Otherwise, I can 
work on it.

                
> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1549
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.3
>            Reporter: Jacky007
>            Priority: Blocker
>         Attachments: case.patch
>
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
> not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1.    Lets say there are three nodes in the ensemble A,B,C with A being the 
> leader
> 2.    The current epoch is 7. 
> 3.    For simplicity of the example, lets say zxid is a two digit number, 
> with epoch being the first digit.
> 4.    The zxid is 73
> 5.    All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there 
> is a crash of the entire ensemble and B,C never write the change 74 to their 
> log.
> Step 2
> A,B restart, A is elected as the new leader,  and A will load data and take a 
> clean snapshot(change 74 is in it), then send diff to B, but B died before 
> sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71, 
> maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory 
> data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff. 
> Problem:
> The problem with the above sequence is that after truncate the log, A will 
> load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), 
> the leader will send a snapshot to follower, it will not be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to