Race condition while taking snapshot can lead to not restoring data tree correctly ----------------------------------------------------------------------------------
Key: ZOOKEEPER-1090 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1090 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.3 Reporter: Vishal K Priority: Critical Fix For: 3.4.0 I think I have found a bug in the snapshot mechanism. The problem occurs because dt.lastProcessedZxid is not synchronized (or rather set before the data tree is modified): FileTxnSnapLog: {code} public void save(DataTree dataTree, ConcurrentHashMap<Long, Integer> sessionsWithTimeouts) throws IOException { long lastZxid = dataTree.lastProcessedZxid; LOG.info("Snapshotting: " + Long.toHexString(lastZxid)); File snapshot=new File( snapDir, Util.makeSnapshotName(lastZxid)); snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot); <=== the Datatree may not have the modification for lastProcessedZxid } {code} DataTree: {code} public ProcessTxnResult processTxn(TxnHeader header, Record txn) { ProcessTxnResult rc = new ProcessTxnResult(); String debug = ""; try { rc.clientId = header.getClientId(); rc.cxid = header.getCxid(); rc.zxid = header.getZxid(); rc.type = header.getType(); rc.err = 0; if (rc.zxid > lastProcessedZxid) { lastProcessedZxid = rc.zxid; } [...modify data tree...] } {code} The lastProcessedZxid must be set after the modification is done. As a result, if server crashes after taking the snapshot (and the snapshot does not contain change corresponding to lastProcessedZxid) restore will not restore the data tree correctly: {code} public long restore(DataTree dt, Map<Long, Integer> sessions, PlayBackListener listener) throws IOException { snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); <=== Assumes lastProcessedZxid is deserialized } {code} I have had offline discussion with Ben and Camille on this. I will be posting the discussion shortly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira