Adam Milne-Smith created ZOOKEEPER-2234:
-------------------------------------------

             Summary: Snapshot serialization race condition can lead to partial 
transaction and inoperable data node
                 Key: ZOOKEEPER-2234
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2234
             Project: ZooKeeper
          Issue Type: Bug
    Affects Versions: 3.4.6
            Reporter: Adam Milne-Smith
            Priority: Minor


This issue can be reproduced by creating a node with a new ACL during data tree 
serialization after ACL cache serialization. When restoring from this snapshot 
without the tranlog, the state will include a node with no corresponding ACL in 
the ACL cache. This node will then be impossible to operate on as it will cause 
a MarshallingError.

If the tranlog is played over a server in this erroneous state, it does appear 
to correct itself. This bug means that to reliably restore from a snapshot, you 
must also have backed up the subsequent tranlog covering at least the 
transactions that were partially written to the snapshot.

Issue first described here:
http://mail-archives.apache.org/mod_mbox/zookeeper-user/201507.mbox/%[email protected]%3E

It also appears possible for a snapshot to be missing a session yet contain an 
ephemeral node created by that session; fortunately ZooKeeperServer.loadData() 
should clean these ephemerals up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to