BookKeeper has a wrapper class for the ZooKeeper client called
ZooKeeperClient.
Its purpose appears to be to transparently perform retries in the case that
ZooKeeper returns ConnectionLoss on an operation due to a Disconnect event.

The trouble is that it's possible that a write which received a
ConnectionLoss
error as a return value actually succeeded.  Once ZooKeeperClient retries,
it'll
get back NodeExists and propagate that error to the caller, even though the
write succeeded and the node in fact did not exist.

It seems as though the same issue would hold for setData and delete calls
which
use the version argument -- you could get a spurious BadVersion.

I believe I've reproduced the variant with a spurious NodeExists.  It
manifested as a suprious BKLedgerExistException when running against a 3
instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
storage
to force Disconnect events by injecting write delays.  This seems to make
sense
as AbstractZkLedgerManager.createLedgerMetadata uses
ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
appears
to depend on the create atomicity to avoid two writers overwriting each
other's
nodes.

AbstractZkLedgerManager.writeLedger would seem to have the same problem with
its dependence on using setData with the version argument to avoid lost
updates.

Am I missing something in this analysis?  It seems to me that behavior could
be actually problematic during periods of spotty connectivity to the
ZooKeeper cluster.

Thanks!
-Sam

Reply via email to