[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859210#comment-15859210
 ] 

Kfir Lev-Ari edited comment on ZOOKEEPER-2684 at 2/9/17 8:41 AM:
-----------------------------------------------------------------

[~nerdyyatrice], can you please describe the scenario in which the same request 
is processed in the queue twice? 

As I see it, if a request r is received from a local client, then r is added to 
the queue (note that r was already sent to the leader prior to that point).

Once a commit arrives from the leader, r is processed, and r won't be back to 
the queue, regardless of a possible client disconnection (AFAIK, the connection 
is only needed at the end of the line, when some kind of result is returned).

Now, lets say the client gets disconnected at some point in the time frame 
above while r is processed, and connects to some server (same server or 
different). 

If a commit arrives to a different server, r will be processed as if it belongs 
to a remote client, i.e., we will only perform the update, without using the 
connection. I'm not sure that after disconnection ZK is required to inform the 
client's new session on his past actions.. (but I guess it can also be fixed if 
needed).
If a commit arrives and r is in the queue waiting for it, then it is processed 
as if it belongs to a local connected client, but eventually the connection 
handle will show that that connection ended, (if I remember the code 
correctly), so nothing to report, but ZK continue as usual. 

Note that if a client writes something with lower cxid than r, the commit 
processor doesn't track such a behavior, i.e., it is possible that the next 
head after r will have lower cxid than r. We only care about the order of 
commits that we receive from the leader, and that order can't be changed, 
because it is based on the network protocol order of messages (i.e., if r was 
already sent to the leader, than clearly r is committed prior to any new 
message of the same client). 

Bottom line, it seems like r is processed only once per processor. What am I 
missing?


was (Author: kfirlevari):
[~nerdyyatrice], can you please describe the scenario in which the same request 
is processed in the queue twice? 

As I see it, if a request r is received from a local client, then r is added to 
the queue (note that r was already sent to the leader prior to that point).

Once a commit arrives from the leader, r is processed, and r won't be back to 
the queue, regardless of a possible client disconnection (AFAIK, the connection 
is only needed at the end of the line, when some kind of result is returned).

Now, lets say the client gets disconnected at some point in the time frame 
above while r is processed, and connects to some server (same server or 
different). 

In the patch, if a commit arrives to a different server, r will be processed as 
if it belongs to a remote client, i.e., we will only perform the update, 
without using the connection. I'm not sure that after disconnection ZK is 
required to inform the client's new session on his past actions.. (but I guess 
it can also be fixed if needed).
If a commit arrives and r is in the queue waiting for it, then it is processed 
as if it belongs to a local connected client, but eventually the connection 
handle will show that that connection ended, (if I remember the code 
correctly), so nothing to report, but ZK continue as usual. 

Note that if a client writes something with lower cxid than r, the commit 
processor doesn't track such a behavior, i.e., it is possible that the next 
head after r will have lower cxid than r. We only care about the order of 
commits that we receive from the leader, and that order can't be changed, 
because it is based on the network protocol order of messages (i.e., if r was 
already sent to the leader, than clearly r is committed prior to any new 
message of the same client). 

Bottom line, it seems like r is processed only once per processor. What am I 
missing?

> Fix a crashing bug in the mixed workloads commit processor
> ----------------------------------------------------------
>
>                 Key: ZOOKEEPER-2684
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2684
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.0
>         Environment: with pretty heavy load on a real cluster
>            Reporter: Ryan Zhang
>            Assignee: Ryan Zhang
>            Priority: Blocker
>         Attachments: ZOOKEEPER-2684.patch
>
>
> We deployed our build with ZOOKEEPER-2024 and it quickly started to crash 
> with the following error
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:24:42,305 - ERROR 
> [CommitProcessor:2] 
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
>  – Got cxid 0x119fa expected 0x11fc5 for client session id 1009079ba470055
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:32:04,746 - ERROR 
> [CommitProcessor:2] 
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
>  – Got cxid 0x698 expected 0x928 for client session id 4002eeb3fd0009d
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:34:46,648 - ERROR 
> [CommitProcessor:2] 
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
>  – Got cxid 0x8904 expected 0x8f34 for client session id 51b8905c90251
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:43:46,834 - ERROR 
> [CommitProcessor:2] 
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
>  – Got cxid 0x3a8d expected 0x3ebc for client session id 2051af11af900cc
> clearly something is not right in the new commit processor per session queue 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to