[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky007 updated ZOOKEEPER-1549:
--------------------------------

    Description: 
the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
not correct.
here is scenario(similar to 1154):
Initial Condition
1.      Lets say there are three nodes in the ensemble A,B,C with A being the 
leader
2.      The current epoch is 7. 
3.      For simplicity of the example, lets say zxid is a two digit number, 
with epoch being the first digit.
4.      The zxid is 73
5.      All the nodes have seen the change 73 and have persistently logged it.
Step 1
Request with zxid 74 is issued. The leader A writes it to the log but there is 
a crash of the entire ensemble and B,C never write the change 74 to their log.
Step 3
B,C restart, A is still down
B,C form the quorum
B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
epoch is now 8, zxid is 80
Request with zxid 81 is successful. On B, minCommitLog is now 71, maxCommitLog 
is 81
Step 4
A starts up. It applies the change in request with zxid 74 to its in-memory 
data tree
A contacts B to registerAsFollower and provides 74 as its ZxId
Since 71<=74<=81, B decides to send A the diff. B will send to A the proposal 
81.
Problem:
The problem with the above sequence is that A's data tree has the update from 
request 74, which is not correct. Before getting the proposals 81, A should 
have received a trunc to 73. I don't see that in the code. If the maxCommitLog 
on B hadn't bumped to 81 but had stayed at 73, that case seems to be fine.

  was:ZOOKEEPER-1154

    
> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1549
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.3, 3.3.6
>            Reporter: Jacky007
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
> not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1.    Lets say there are three nodes in the ensemble A,B,C with A being the 
> leader
> 2.    The current epoch is 7. 
> 3.    For simplicity of the example, lets say zxid is a two digit number, 
> with epoch being the first digit.
> 4.    The zxid is 73
> 5.    All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there 
> is a crash of the entire ensemble and B,C never write the change 74 to their 
> log.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71, 
> maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory 
> data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff. B will send to A the proposal 
> 81.
> Problem:
> The problem with the above sequence is that A's data tree has the update from 
> request 74, which is not correct. Before getting the proposals 81, A should 
> have received a trunc to 73. I don't see that in the code. If the 
> maxCommitLog on B hadn't bumped to 81 but had stayed at 73, that case seems 
> to be fine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to