[
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616890#comment-15616890
]
Diego Ongaro edited comment on ZOOKEEPER-2619 at 10/28/16 11:11 PM:
--------------------------------------------------------------------
bq. Some of the issues you're describing, we only had the opportunity to
discuss in more detail in the book. It is really not self-promotion, and it is
where we had a chance to talk more about some of the subtleties we have.
--[~fpj]
Indeed, page 116-117 of the ZooKeeper: Distributed Process Coordination book
mentions this issue. It suggests:
bq. If Op2 depends on Op1, ... wait for a successful execution of Op1 before
submitting Op2...this approach is safe, but it adds a performance penalty...
was (Author: ongardie):
bq. Some of the issues you're describing, we only had the opportunity to
discuss in more detail in the book. It is really not self-promotion, and it is
where we had a chance to talk more about some of the subtleties we have.
--[~fpj]
Indeed, page 116-117 of the ZooKeeper: Distributed Process Coordination book
mentions this issue. It suggests:
bq. If Op2 depends on Op1, ... wait for a successful of Op1 before submitting
Op2...this approach is safe, but it adds a performance penalty...
> Client library reconnecting breaks FIFO client order
> ----------------------------------------------------
>
> Key: ZOOKEEPER-2619
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
> Project: ZooKeeper
> Issue Type: Bug
> Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
> ZooKeeper provides "FIFO client order: all requests from a given client are
> executed in the order that they were sent by the client." I believe
> applications written using the Java client library are unable to rely on this
> guarantee, and any current application that does so is broken. Other client
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described
> on Page 4 (right column) of the paper:
> {code}
> zk = new ZooKeeper(...)
> zk.createAsync("/data-23857", "...", callback)
> zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers.
> Applying the above definition, if the ZooKeeper database contains /pointer,
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
> zk = new ZooKeeper(...)
> // The library establishes a TCP connection.
> zk.createAsync("/data-23857", "...", callback)
> // The library/kernel closes the TCP connection because it times out, and
> // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
> // that it never reaches the server.
> // The library establishes a new TCP connection.
> zk.createSync("/pointer", "/data-23857")
> // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following
> three properties:
> # FIFO client order for asynchronous operations,
> # Failing operations when connections are lost, AND
> # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined
> operations, then upon noticing a connection loss, issues a series of recovery
> operations, repeating the recovery procedure as necessary. If a pipelined
> operation fails, all subsequent operations in the pipeline must also fail.
> Yet the client must also carry on eventually: the recovery operations cannot
> be trivially failed forever. Unfortunately, the client library does not know
> where the pipelined operations end and the recovery operations begin. At the
> time of a connection loss, subsequent pipelined operations may or may not be
> queued in the library; others might be upcoming in the application thread. If
> the library re-establishes a connection too early, it will send pipelined
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its
> callbacks and watchers for connection loss events, and do its best to stop
> the subsequent pipelined operations at the first sign of a connection loss.
> In addition to being a large burden for the application, this does not solve
> the problem all the time. In particular, if the callback thread is delayed
> significantly (as can happen due to excessive computation or scheduling
> hiccups), the application may not learn about the connection loss event until
> after the connection has been re-established and after dependent pipelined
> operations have already been transmitted over the new connection.
> I suggest the following API changes to fix the problem:
> - Add a method ZooKeeper.getConnection() returning a ZKConnection object.
> ZKConnection would wrap a TCP connection. It would include all synchronous
> and asynchronous operations currently defined on the ZooKeeper class. Upon a
> connection loss on a ZKConnection, all subsequent operations on the same
> ZKConnection would return a Connection Loss error. Upon noticing, the client
> would need to call ZooKeeper.getConnection() again to get a working
> ZKConnection object, and it would execute its recovery procedure on this new
> connection.
> - Deprecate all asynchronous methods on the ZooKeeper object. These are
> unsafe to use if the caller assumes they're getting FIFO client order.
> - No changes to the protocols or servers are required.
> I recognize this could cause a lot of code churn for both ZooKeeper and
> projects that use it. On the other hand, the existing asynchronous calls in
> applications should now be audited anyhow.
> The code affected by this issue may be difficult to contain:
> - It likely affects all ZooKeeper client libraries that provide both
> asynchronous operations and transparent reconnection. That's probably all
> versions of the official Java client library, as well as most other client
> libraries.
> - It affects all applications using those libraries that depend on the FIFO
> client order of asynchronous operations. I don't know how common that is, but
> the paper implies that FIFO client order is important.
> - Fortunately, the issue can only manifest itself when connections are lost
> and transparently reestablished. In practice, it may also require a long
> pipeline or a significant delay in the application thread while the library
> establishes a new connection.
> - In case you're wondering, this issue occurred to me while working on a new
> client library for Go. I haven't seen this issue in the wild, but I was able
> to produce it locally by placing sleep statements in a Java program and
> closing its TCP connections.
> I'm new to this community, so I'm looking forward to the discussion. Let me
> know if I can clarify any of the above.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)