[ https://issues.apache.org/jira/browse/ZOOKEEPER-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046571#comment-13046571 ]
Vishal K commented on ZOOKEEPER-1090: ------------------------------------- yeah, you are right. I think we should make update of datatree and lastProcessedzxid look atomic. On Wed, Jun 8, 2011 at 2:14 PM, Fournier, Camille F. [Tech] wrote: So we might over increment a parent cvid... What does that mean for a client? Maybe missing a seq id node number? Seems ok but should probably document. C From: Vishal Kher To: Fournier, Camille F. [Tech] Cc: Benjamin Reed Sent: Wed Jun 08 13:53:59 2011 Subject: Re: Race condition while taking snapshot This will result in datatree having latest data but lastProcessedZxid not up-to-date. So when you take snapshot.x, the file may contain data for x + 1. But I think this is ok (and this is how snpashot currently works) because during restore we will start applying transactions from x + 1. Some of them will fail (e.,g., create/delete) and we will increment only the zxid on failure. I have not analyzed this completely yet. But I think it will be ok. Thanks. -Vishal On Wed, Jun 8, 2011 at 1:33 PM, Fournier, Camille F. [Tech] wrote: Yes I think you are right. What do we think the implications of moving the setting of lastProcessedZxid to below the processing of the txn? Looks like not a big problem but I haven’t the mental bandwidth to think it through at the moment. C > Race condition while taking snapshot can lead to not restoring data tree > correctly > ---------------------------------------------------------------------------------- > > Key: ZOOKEEPER-1090 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1090 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.3.3 > Reporter: Vishal K > Priority: Critical > Labels: persistence, server, snapshot > Fix For: 3.4.0 > > > I think I have found a bug in the snapshot mechanism. > The problem occurs because dt.lastProcessedZxid is not synchronized (or > rather set before the data tree is modified): > FileTxnSnapLog: > {code} > public void save(DataTree dataTree, > ConcurrentHashMap<Long, Integer> sessionsWithTimeouts) > throws IOException { > long lastZxid = dataTree.lastProcessedZxid; > LOG.info("Snapshotting: " + Long.toHexString(lastZxid)); > File snapshot=new File( > snapDir, Util.makeSnapshotName(lastZxid)); > snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot); <=== > the Datatree may not have the modification for lastProcessedZxid > } > {code} > DataTree: > {code} > public ProcessTxnResult processTxn(TxnHeader header, Record txn) { > ProcessTxnResult rc = new ProcessTxnResult(); > String debug = ""; > try { > rc.clientId = header.getClientId(); > rc.cxid = header.getCxid(); > rc.zxid = header.getZxid(); > rc.type = header.getType(); > rc.err = 0; > if (rc.zxid > lastProcessedZxid) { > lastProcessedZxid = rc.zxid; > } > [...modify data tree...] > } > {code} > The lastProcessedZxid must be set after the modification is done. > As a result, if server crashes after taking the snapshot (and the snapshot > does not contain change corresponding to lastProcessedZxid) restore will not > restore the data tree correctly: > {code} > public long restore(DataTree dt, Map<Long, Integer> sessions, > PlayBackListener listener) throws IOException { > snapLog.deserialize(dt, sessions); > FileTxnLog txnLog = new FileTxnLog(dataDir); > TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); <=== Assumes > lastProcessedZxid is deserialized > } > {code} > I have had offline discussion with Ben and Camille on this. I will be posting > the discussion shortly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira