[
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454682#comment-13454682
]
Jacky007 commented on ZOOKEEPER-1549:
-------------------------------------
Thanks for Falvio to comment. I agree with Falvio, deleting a snapshot is too
dangerous to be an optional. There should be another solution.
Step 1
74 is in A's transaction logs.
Step 2
A is the new leader, and it will execute the following code.
{noformat}
void lead() throws IOException, InterruptedException {
self.end_fle = System.currentTimeMillis();
LOG.info("LEADING - LEADER ELECTION TOOK - " +
(self.end_fle - self.start_fle));
self.start_fle = 0;
self.end_fle = 0;
zk.registerJMX(new LeaderBean(this, zk), self.jmxLocalPeerBean);
try {
self.tick = 0;
zk.loadData();
{noformat}
Then A will load its snapshot and committedlog.
{noformat}
public void loadData() throws IOException, InterruptedException {
setZxid(zkDb.loadDataBase());
// Clean up dead sessions
LinkedList<Long> deadSessions = new LinkedList<Long>();
for (Long session : zkDb.getSessions()) {
if (zkDb.getSessionWithTimeOuts().get(session) == null) {
deadSessions.add(session);
}
}
zkDb.setDataTreeInit(true);
for (long session : deadSessions) {
// XXX: Is lastProcessedZxid really the best thing to use?
killSession(session, zkDb.getDataTreeLastProcessedZxid());
}
// Make a clean snapshot
takeSnapshot();
}
{noformat}
when A takeSnapshot(), 74 is in it(if A dies after that, B will never know it).
When A load database,
{noformat}
public void loadData() throws IOException, InterruptedException {
setZxid(zkDb.loadDataBase());
{noformat}
it will restore database from snapshots and transaction logs,
{noformat}
long zxid = snapLog.restore(dataTree,sessionsWithTimeouts,listener);
{noformat}
{noformat}
try {
processTransaction(hdr,dt,sessions, itr.getTxn());
} catch(KeeperException.NoNodeException e) {
throw new IOException("Failed to process transaction type: " +
hdr.getType() + " error: " + e.getMessage(), e);
}
listener.onTxnLoaded(hdr, itr.getTxn());
{noformat}
but 74 is in A's transaction logs.
> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1549
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.4.3
> Reporter: Jacky007
> Priority: Critical
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is
> not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1. Lets say there are three nodes in the ensemble A,B,C with A being the
> leader
> 2. The current epoch is 7.
> 3. For simplicity of the example, lets say zxid is a two digit number,
> with epoch being the first digit.
> 4. The zxid is 73
> 5. All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there
> is a crash of the entire ensemble and B,C never write the change 74 to their
> log.
> Step 2
> A,B restart, A is elected as the new leader, and A will load data and take a
> clean snapshot(change 74 is in it), then send diff to B, but B died before
> sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71,
> maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory
> data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff.
> Problem:
> The problem with the above sequence is that after truncate the log, A will
> load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874),
> the leader will send a snapshot to follower, it will not be a problem.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira