[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606298#comment-15606298
 ] 

Diego Ongaro commented on ZOOKEEPER-2619:
-----------------------------------------

bq. I don't like much the idea of exposing a "connection". The abstraction we 
have historically is a session and the fact that a session can have multiple 
connections over time is mostly transparent to the application, except for the 
connection loss event, which makes it clear that there are multiple connections 
going on underneath. --[~fpj]

Nobody wants ZOOKEEPER-22 more than I do ;). It would solve this issue and make 
correct ZooKeeper usage so much simpler. But until ZOOKEEPER-22 is done, the 
concept of a connection is not hidden from applications, and exposing 
connections may be the most straightforward and honest thing to do. Even with 
the FIFO token, the application would need to know to grab a new FIFO token 
after seeing a connection loss event, so FIFO tokens can't be a fully separate 
concept from connections.

Otherwise, the FIFO token vs {{getConnection()}} are equivalent in terms of 
their expressiveness, so I see the choice as a question of aesthetics.


bq. Another point I was thinking about with respect to the use of getConnection 
is the following. If I'm a developer, where should I call it and when? In 
principle, we can do it upon receiving a SyncConnected event via the default 
watcher (transition from Disconnected to SyncConnected. It might not be 
possible to do it at that point in the case I have one or more threads 
submitting operations, though. The applications will need to pause before we 
get the new connection, otherwise we can have a mix of threads sending to the 
invalid connection while others send to the new connection. In the case it is 
ok that the FIFO order guarantee is per thread, we can have each thread 
individually calling getConnection (or whatever other variant we end up 
agreeing upon) upon receiving a connection loss error. --[~fpj]

I'd argue strongly for the latter approach of using connections/FIFO tokens 
only locally. Applications should grab a connection/FIFO token when they're 
doing asynchronous work, then forget it and grab another one later when needed.

If someone did want the FIFO client order guarantee across threads, they might 
be insane, but they can share their connection/FIFO token across threads. I 
don't think this will be common usage.

Taking the example from page 4 of the paper, here's some possible code:

{code}
public void newLeader(ZooKeeper zk, ArrayList<string> config) {
    while (true) {
        ZKConnection conn = zk.getConnection()
        // "The new leader makes the configuration change by deleting ready, 
..."
        conn.deleteAsync("/ready", callback=logStuff)
        // "...updating the various configuration znodes, ..."
        for (int i = 0; i < config.size(); i++) {
            conn.createAsync("/config-" + i, config.get(i), EPHEMERAL, 
callback=logStuff)
        }
        // "...and creating ready."
        try {
            conn.createSync("/ready", "", EPHEMERAL)
        } except (ConnectionLoss) {
            continue;
        }
        return;
    }
}
{code}

Note that this code is grabbing a connection (FIFO token) before starting its 
pipeline, using it locally, then dropping it. In case of a connection loss, it 
gets another one.

This could be more sophisticated by keeping track of which operations had 
previously succeeded in the callbacks, but it shouldn't be necessary.

We could also sprinkle in:
{code}
        if (conn.isLost()) {
            continue;
        }
{code}
as an optimization/pessimization, but again, it shouldn't be necessary.

\\
----
\\

bq. do we have some applications that we can use to validate the api? it would 
be nice to validate the design before we standardize it. --[~breed]

Good point. I think we'd want well-intentioned applications that make async 
calls, maybe some that care about FIFO order and some that don't. Any ideas?


There are a couple more alternatives I want to raise for discussion:

One possible stance would be: _Well, this adds too much complexity, so you 
can't have FIFO client order until ZOOKEEPER-22 is done_, WONTFIX.

Another more extreme stance would be to eliminate the FIFO client order 
guarantee altogether. A big question I've been wondering about is whether FIFO 
client order is useful in real world applications, and whether it's worth the 
implementation trouble.

FIFO client order provides two potential benefits:
 - Leveraging FIFO client order may improve application performance. It does 
not add significant runtime cost in general, yet it can reduce round-trip 
stalls for applications that leverage it.
 - FIFO client order may simplify reasoning about applications by reducing the 
possible state space that they operate in.

On the other hand:

 - FIFO client order adds complexity to the implementation. It has to be 
guaranteed at every level of the stack. This issue shows how it was lost in the 
highest levels of the client API. Even if we fixed this issue, Curator or other 
wrappers might still throw away FIFO client order at even higher levels. And at 
lower levels, though the current implementation maintains FIFO client order, 
this has widespread implications restricting its design, including its 
consensus algorithm, its threading model, and its choice of transport protocol.
 - Relatedly, solving ZOOKEEPER-22 would be simpler by eliminating the FIFO 
client order guarantee, as the library could retry operations in any order 
(both reads and writes).
 - How many applications rely on FIFO client order? Users didn't seem to notice 
this bug. Does that imply they're doing just fine without FIFO client order, or 
are the conditions for this bug so rare that it did not come up in production 
or testing?
 - Using FIFO client order correctly seems difficult. My above sample code, for 
instance, looks simple but is quite subtle and fragile. It assumes the 
asynchronous operations will succeed as long as the connection is not lost 
(existing znodes in the way? ACLs? quotas after ZOOKEEPER-451?), and it only 
avoids leaving garbage behind on client crashes by using ephemeral nodes. 
Compare to a version that issues all the createAsync requests, blocks on their 
successful completion (possibly recovering from errors in nontrivial ways), 
then creates the /ready node. I think it'd be easier to see why that version 
works, at the runtime cost of an extra round trip delay.

I know this is only tangentially related to this issue, but I'm really 
interested in hearing your thoughts.

> Client library reconnecting breaks FIFO client order
> ----------------------------------------------------
>
>                 Key: ZOOKEEPER-2619
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or scheduling 
> hiccups), the application may not learn about the connection loss event until 
> after the connection has been re-established and after dependent pipelined 
> operations have already been transmitted over the new connection.
> I suggest the following API changes to fix the problem:
>  - Add a method ZooKeeper.getConnection() returning a ZKConnection object. 
> ZKConnection would wrap a TCP connection. It would include all synchronous 
> and asynchronous operations currently defined on the ZooKeeper class. Upon a 
> connection loss on a ZKConnection, all subsequent operations on the same 
> ZKConnection would return a Connection Loss error. Upon noticing, the client 
> would need to call ZooKeeper.getConnection() again to get a working 
> ZKConnection object, and it would execute its recovery procedure on this new 
> connection.
>  - Deprecate all asynchronous methods on the ZooKeeper object. These are 
> unsafe to use if the caller assumes they're getting FIFO client order.
>  - No changes to the protocols or servers are required.
> I recognize this could cause a lot of code churn for both ZooKeeper and 
> projects that use it. On the other hand, the existing asynchronous calls in 
> applications should now be audited anyhow.
> The code affected by this issue may be difficult to contain:
>  - It likely affects all ZooKeeper client libraries that provide both 
> asynchronous operations and transparent reconnection. That's probably all 
> versions of the official Java client library, as well as most other client 
> libraries.
>  - It affects all applications using those libraries that depend on the FIFO 
> client order of asynchronous operations. I don't know how common that is, but 
> the paper implies that FIFO client order is important.
>  - Fortunately, the issue can only manifest itself when connections are lost 
> and transparently reestablished. In practice, it may also require a long 
> pipeline or a significant delay in the application thread while the library 
> establishes a new connection.
>  - In case you're wondering, this issue occurred to me while working on a new 
> client library for Go. I haven't seen this issue in the wild, but I was able 
> to produce it locally by placing sleep statements in a Java program and 
> closing its TCP connections.
> I'm new to this community, so I'm looking forward to the discussion. Let me 
> know if I can clarify any of the above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to