Hi Sam,

Let's assume there is no such retry logic. How do you expect to handle this
situation?

Can your application try to create a new ledger or catch NodeExists
exception?

- Sijie

On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sj...@salesforce.com> wrote:

> Hmm, curator seems to have essentially the same problem:
> https://issues.apache.org/jira/browse/CURATOR-268
> I'm not sure there's a good way to solve this transparently -- the right
> answer is
> probably to plumb the ConnectionLoss event through ZooKeeperClient for
> interested callers, audit for metadata users where we depend on atomicity,
> and update each one to handle it appropriately.
> -Sam
>
> On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com> wrote:
>
> > BookKeeper has a wrapper class for the ZooKeeper client called
> > ZooKeeperClient.
> > Its purpose appears to be to transparently perform retries in the case
> that
> > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
> event.
> >
> > The trouble is that it's possible that a write which received a
> > ConnectionLoss
> > error as a return value actually succeeded.  Once ZooKeeperClient
> retries,
> > it'll
> > get back NodeExists and propagate that error to the caller, even though
> the
> > write succeeded and the node in fact did not exist.
> >
> > It seems as though the same issue would hold for setData and delete calls
> > which
> > use the version argument -- you could get a spurious BadVersion.
> >
> > I believe I've reproduced the variant with a spurious NodeExists.  It
> > manifested as a suprious BKLedgerExistException when running against a 3
> > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> > storage
> > to force Disconnect events by injecting write delays.  This seems to make
> > sense
> > as AbstractZkLedgerManager.createLedgerMetadata uses
> > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> > appears
> > to depend on the create atomicity to avoid two writers overwriting each
> > other's
> > nodes.
> >
> > AbstractZkLedgerManager.writeLedger would seem to have the same problem
> > with
> > its dependence on using setData with the version argument to avoid lost
> > updates.
> >
> > Am I missing something in this analysis?  It seems to me that behavior
> > could
> > be actually problematic during periods of spotty connectivity to the
> > ZooKeeper cluster.
> >
> > Thanks!
> > -Sam
> >
>

Reply via email to