Adam Milne-Smith created ZOOKEEPER-2234:
-------------------------------------------
Summary: Snapshot serialization race condition can lead to partial
transaction and inoperable data node
Key: ZOOKEEPER-2234
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2234
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.4.6
Reporter: Adam Milne-Smith
Priority: Minor
This issue can be reproduced by creating a node with a new ACL during data tree
serialization after ACL cache serialization. When restoring from this snapshot
without the tranlog, the state will include a node with no corresponding ACL in
the ACL cache. This node will then be impossible to operate on as it will cause
a MarshallingError.
If the tranlog is played over a server in this erroneous state, it does appear
to correct itself. This bug means that to reliably restore from a snapshot, you
must also have backed up the subsequent tranlog covering at least the
transactions that were partially written to the snapshot.
Issue first described here:
http://mail-archives.apache.org/mod_mbox/zookeeper-user/201507.mbox/%[email protected]%3E
It also appears possible for a snapshot to be missing a session yet contain an
ephemeral node created by that session; fortunately ZooKeeperServer.loadData()
should clean these ephemerals up.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)