[
https://issues.apache.org/jira/browse/ZOOKEEPER-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Slocum resolved ZOOKEEPER-3927.
------------------------------------
Resolution: Duplicate
> ZooKeeper Client Fault Tolerance Extensions
> -------------------------------------------
>
> Key: ZOOKEEPER-3927
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3927
> Project: ZooKeeper
> Issue Type: Improvement
> Reporter: Josh Slocum
> Priority: Minor
>
> Tl;dr My team at Indeed has developed ZooKeeper functionality to handle
> stateful retrying of connectionloss for write operations, and we wanted to
> reach out to discuss if this is something the ZooKeeper team may be
> interested in incorporating into the ZooKeeper client or in a separate
> wrapper.
>
> Hi ZooKeeper Devs,
> My team uses zookeeper extensively as part of a distributed key-value store
> we've built at Indeed (think HBase replacement). Due to our deployment setup
> co-locating our database daemons with our large hadoop cluster, and the
> network-intensive nature of a lot of our compute jobs, we were experiencing a
> large amount of transient ConnectionLoss issues. This was especially
> problematic on important write operations, such as the creation deletion of
> distributed locks/leases or updating distributed state in the cluster.
> We saw that some existing zookeeper client wrappers handled retrying in the
> presence of ConnectionLoss, but all of the ones we looked at
> ([Curator|https://curator.apache.org/]
> [Kazoo|https://github.com/python-zk/kazoo], etc...) retried writes the same
> as reads - blindly in a loop. This meant that upon retrying a create for
> example, if the initial create had succeeded on the server but the client got
> connectionloss, we would get a NodeExists exception on the retried request,
> even though the znode was created. This resulted in many issues. For the
> distributed lock/lease example, to other nodes, it looked like the calling
> node had been successful acquiring the "lock", and to the calling node, it
> appeared that it was not able to acquire the "lock", which results in a
> deadlock.
> To solve this, we implemented a set of "connection-loss tolerant primitives"
> for the main types of write operations. They handle a connection loss by
> retrying the operation in a loop, but upon error cases in the retry, inspect
> the current state to see if it matches the case where a previous round that
> got connectionloss actually succeeded.
> * createRetriable(String path, byte[] data)
> * setDataRetriable(String path, byte[] newData, int currentVersion)
> * deleteRetriable(String path, int currentVersion)
> * compareAndDeleteRetriable(String path, byte[] currentData, int
> currentVersion)
> For example, in createRetriable, it will retry the create again on connection
> loss. If the retried call gets a NodeExists exception, it will check to see
> if (getData(path) == data and dataVersion == 0). If it does, it assumes the
> first create succeeded and returns success, otherwise it propagates the
> NodeExists exception.
> These primitives have allowed us to program our ZooKeeper layer as if
> ConnectionLoss isn't a transient state we have to worry about, since they
> have essentially the same guarantees as the non-retriable functions in the
> zookeeper api do (with a slight difference in semantics).
> Because this problem is not solved anywhere else that uses zookeeper (to my
> knowledge), we think it could be a useful contribution to the ZooKeeper
> project.
> However, if you are not looking for contributions to extend the zookeeper
> api, and prefer client extensions to be separate, for example Curator, then
> we would consider contributing there or open sourcing our implementation as a
> standalone library.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)