[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632811#comment-15632811
 ] 

Benjamin Reed commented on ZOOKEEPER-2619:
------------------------------------------

multi will handle some of the use cases, but a simple one that it doesn't 
handle is if you want to implement swap:

zk.getData(znode, ...)
zk.setData(znode, ...)

you can't do that with multi (and i don't think we should extend multi to do it 
:)

mutli also doesn't handle the case when you are updating lots of data and would 
go over max packet size.



> Client library reconnecting breaks FIFO client order
> ----------------------------------------------------
>
>                 Key: ZOOKEEPER-2619
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or scheduling 
> hiccups), the application may not learn about the connection loss event until 
> after the connection has been re-established and after dependent pipelined 
> operations have already been transmitted over the new connection.
> I suggest the following API changes to fix the problem:
>  - Add a method ZooKeeper.getConnection() returning a ZKConnection object. 
> ZKConnection would wrap a TCP connection. It would include all synchronous 
> and asynchronous operations currently defined on the ZooKeeper class. Upon a 
> connection loss on a ZKConnection, all subsequent operations on the same 
> ZKConnection would return a Connection Loss error. Upon noticing, the client 
> would need to call ZooKeeper.getConnection() again to get a working 
> ZKConnection object, and it would execute its recovery procedure on this new 
> connection.
>  - Deprecate all asynchronous methods on the ZooKeeper object. These are 
> unsafe to use if the caller assumes they're getting FIFO client order.
>  - No changes to the protocols or servers are required.
> I recognize this could cause a lot of code churn for both ZooKeeper and 
> projects that use it. On the other hand, the existing asynchronous calls in 
> applications should now be audited anyhow.
> The code affected by this issue may be difficult to contain:
>  - It likely affects all ZooKeeper client libraries that provide both 
> asynchronous operations and transparent reconnection. That's probably all 
> versions of the official Java client library, as well as most other client 
> libraries.
>  - It affects all applications using those libraries that depend on the FIFO 
> client order of asynchronous operations. I don't know how common that is, but 
> the paper implies that FIFO client order is important.
>  - Fortunately, the issue can only manifest itself when connections are lost 
> and transparently reestablished. In practice, it may also require a long 
> pipeline or a significant delay in the application thread while the library 
> establishes a new connection.
>  - In case you're wondering, this issue occurred to me while working on a new 
> client library for Go. I haven't seen this issue in the wild, but I was able 
> to produce it locally by placing sleep statements in a Java program and 
> closing its TCP connections.
> I'm new to this community, so I'm looking forward to the discussion. Let me 
> know if I can clarify any of the above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to