Race condition while taking snapshot can lead to not restoring data tree 
correctly
----------------------------------------------------------------------------------

                 Key: ZOOKEEPER-1090
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1090
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.3.3
            Reporter: Vishal K
            Priority: Critical
             Fix For: 3.4.0


I think I have found a bug in the snapshot mechanism.

The problem occurs because dt.lastProcessedZxid is not synchronized (or rather 
set before the data tree is modified):

FileTxnSnapLog:
{code}
    public void save(DataTree dataTree,
            ConcurrentHashMap<Long, Integer> sessionsWithTimeouts)
        throws IOException {
        long lastZxid = dataTree.lastProcessedZxid;
        LOG.info("Snapshotting: " + Long.toHexString(lastZxid));
        File snapshot=new File(
                snapDir, Util.makeSnapshotName(lastZxid));
        snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot);   <=== the 
Datatree may not have the modification for lastProcessedZxid
    }
{code}

DataTree:
{code}
    public ProcessTxnResult processTxn(TxnHeader header, Record txn) {
        ProcessTxnResult rc = new ProcessTxnResult();

        String debug = "";
        try {
            rc.clientId = header.getClientId();
            rc.cxid = header.getCxid();
            rc.zxid = header.getZxid();
            rc.type = header.getType();
            rc.err = 0;
            if (rc.zxid > lastProcessedZxid) {
                lastProcessedZxid = rc.zxid;
            }
            [...modify data tree...]           
 }
{code}
The lastProcessedZxid must be set after the modification is done.

As a result, if server crashes after taking the snapshot (and the snapshot does 
not contain change corresponding to lastProcessedZxid) restore will not restore 
the data tree correctly:
{code}
public long restore(DataTree dt, Map<Long, Integer> sessions,
            PlayBackListener listener) throws IOException {
        snapLog.deserialize(dt, sessions);
        FileTxnLog txnLog = new FileTxnLog(dataDir);
        TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); <=== Assumes 
lastProcessedZxid is deserialized
 }
{code}


I have had offline discussion with Ben and Camille on this. I will be posting 
the discussion shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to