Fangmin Lv created ZOOKEEPER-3145:
-------------------------------------
Summary: Potential watch missing issue due to stale pzxid when
replaying CloseSession txn with fuzzy snapshhot
Key: ZOOKEEPER-3145
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3145
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.4.13, 3.5.4, 3.6.0
Reporter: Fangmin Lv
Assignee: Fangmin Lv
Fix For: 3.6.0
This is another issue I found recently, we haven't seen this problem on prod
(or maybe we don't notice).
Currently, the CloseSession is not idempotent, executing the CloseSession twice
won't get the same result.
The problem is that closeSession will only check what's the ephemeral nodes
associated with that session bases on current states. Nodes deleted during
taking fuzzy snapshot won't be deleted again when replay the txn.
This looks fine, since it's already gone, but there is problem with the pzxid
of the parent node. Snapshot is taken fuzzily, so it's possible that the parent
had been serialized while the nodes are being deleted when executing the
closeSession Txn. The pzxid will not be updated in the snapshot when replaying
the closeSession txn, because doesn't know what's the paths being deleted, so
it won't patch the pzxid like what we did in the deleteNode ZOOKEEPER-3125.
The inconsistent pzxid will lead to potential watch notification missing when
client reconnect with setWatches because of the staleness.
This JIRA is going to fix those issues by adding the CloseSessionTxn, it will
record all those nodes being deleted in that CloseSession txn, so that we know
which nodes to update when replaying the txn.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)