[ https://issues.apache.org/jira/browse/ZOOKEEPER-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071249#comment-13071249 ]
Vishal K commented on ZOOKEEPER-1090: ------------------------------------- I don't have a test that that will fail without the patch. Earlier, we had a problem because snapshot.x did not contain x. Now, we cannot run into that scenario because we first apply x and then increment lastProcessedZxid. with the patch we can only have a scenario where we have applied x in snapshot.x-1. Note - a snapshot can always contain data > than lastProcessedZxid so the patch is not introducing a new scenario. So I wrote a test to verify correctness of this scenario. Any suggestions? Can you elaborate on our comment: {quote} rather set before the data tree is modified): Datatree may not have the modification for lastProcessedZxid snapshot does not contain change corresponding to lastProcessedZxid) restore will not restore the data tree correctly: lastProcessedZxid is deserialized posting the discussion shortly. {quote} After going through the code again, I think the following hypothetical failure scenario could be a problem: For some reason, modification to data tree fails for transactions x to x-n on one of the peer. The lastProcessedZxid will still be set to x. If the peer restores after snapshot, then it will not have transactions x-n to x in its data tree. Now, if the modification had failed due to transient errors (say out of memory), then we will not apply transactions x-n to x even if we should have. The only other reason we could run into this scenario (apart from runtime failures) is if we had a bug in DataTree modification code path. This looks like a rare corner case. Note - the above problem is not an effect of the patch. it is present in current implementation as well. > Race condition while taking snapshot can lead to not restoring data tree > correctly > ---------------------------------------------------------------------------------- > > Key: ZOOKEEPER-1090 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1090 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.3.3 > Reporter: Vishal K > Assignee: Vishal K > Priority: Critical > Labels: persistence, server, snapshot > Fix For: 3.4.0 > > Attachments: ZOOKEEPER-1090 > > > I think I have found a bug in the snapshot mechanism. > The problem occurs because dt.lastProcessedZxid is not synchronized (or > rather set before the data tree is modified): > FileTxnSnapLog: > {code} > public void save(DataTree dataTree, > ConcurrentHashMap<Long, Integer> sessionsWithTimeouts) > throws IOException { > long lastZxid = dataTree.lastProcessedZxid; > LOG.info("Snapshotting: " + Long.toHexString(lastZxid)); > File snapshot=new File( > snapDir, Util.makeSnapshotName(lastZxid)); > snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot); <=== > the Datatree may not have the modification for lastProcessedZxid > } > {code} > DataTree: > {code} > public ProcessTxnResult processTxn(TxnHeader header, Record txn) { > ProcessTxnResult rc = new ProcessTxnResult(); > String debug = ""; > try { > rc.clientId = header.getClientId(); > rc.cxid = header.getCxid(); > rc.zxid = header.getZxid(); > rc.type = header.getType(); > rc.err = 0; > if (rc.zxid > lastProcessedZxid) { > lastProcessedZxid = rc.zxid; > } > [...modify data tree...] > } > {code} > The lastProcessedZxid must be set after the modification is done. > As a result, if server crashes after taking the snapshot (and the snapshot > does not contain change corresponding to lastProcessedZxid) restore will not > restore the data tree correctly: > {code} > public long restore(DataTree dt, Map<Long, Integer> sessions, > PlayBackListener listener) throws IOException { > snapLog.deserialize(dt, sessions); > FileTxnLog txnLog = new FileTxnLog(dataDir); > TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); <=== Assumes > lastProcessedZxid is deserialized > } > {code} > I have had offline discussion with Ben and Camille on this. I will be posting > the discussion shortly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira