[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Slocum resolved ZOOKEEPER-3927.
------------------------------------
    Resolution: Duplicate

> ZooKeeper Client Fault Tolerance Extensions
> -------------------------------------------
>
>                 Key: ZOOKEEPER-3927
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3927
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Josh Slocum
>            Priority: Minor
>
> Tl;dr My team at Indeed has developed ZooKeeper functionality to handle 
> stateful retrying of connectionloss for write operations, and we wanted to 
> reach out to discuss if this is something the ZooKeeper team may be 
> interested in incorporating into the ZooKeeper client or in a separate 
> wrapper.
>  
> Hi ZooKeeper Devs,
> My team uses zookeeper extensively as part of a distributed key-value store 
> we've built at Indeed (think HBase replacement). Due to our deployment setup 
> co-locating our database daemons with our large hadoop cluster, and the 
> network-intensive nature of a lot of our compute jobs, we were experiencing a 
> large amount of transient ConnectionLoss issues. This was especially 
> problematic on important write operations, such as the creation deletion of 
> distributed locks/leases or updating distributed state in the cluster. 
> We saw that some existing zookeeper client wrappers handled retrying in the 
> presence of ConnectionLoss, but all of the ones we looked at 
> ([Curator|https://curator.apache.org/]  
> [Kazoo|https://github.com/python-zk/kazoo], etc...) retried writes the same 
> as reads - blindly in a loop. This meant that upon retrying a create for 
> example, if the initial create had succeeded on the server but the client got 
> connectionloss, we would get a NodeExists exception on the retried request, 
> even though the znode was created. This resulted in many issues. For the 
> distributed lock/lease example, to other nodes, it looked like the calling 
> node had been successful acquiring the "lock", and to the calling node, it 
> appeared that it was not able to acquire the "lock", which results in a 
> deadlock.
> To solve this, we implemented a set of "connection-loss tolerant primitives" 
> for the main types of write operations. They handle a connection loss by 
> retrying the operation in a loop, but upon error cases in the retry, inspect 
> the current state to see if it matches the case where a previous round that 
> got connectionloss actually succeeded.
> * createRetriable(String path, byte[] data)
> * setDataRetriable(String path, byte[] newData, int currentVersion)
> * deleteRetriable(String path, int currentVersion)
> * compareAndDeleteRetriable(String path, byte[] currentData, int 
> currentVersion)
> For example, in createRetriable, it will retry the create again on connection 
> loss. If the retried call gets a NodeExists exception, it will check to see 
> if (getData(path) == data and dataVersion == 0). If it does, it assumes the 
> first create succeeded and returns success, otherwise it propagates the 
> NodeExists exception.
> These primitives have allowed us to program our ZooKeeper layer as if 
> ConnectionLoss isn't a transient state we have to worry about, since they 
> have essentially the same guarantees as the non-retriable functions in the 
> zookeeper api do (with a slight difference in semantics).
> Because this problem is not solved anywhere else that uses zookeeper (to my 
> knowledge), we think it could be a useful contribution to the ZooKeeper 
> project.
> However, if you are not looking for contributions to extend the zookeeper 
> api, and prefer client extensions to be separate, for example Curator, then 
> we would consider contributing there or open sourcing our implementation as a 
> standalone library.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to