[ https://issues.apache.org/jira/browse/ZOOKEEPER-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859210#comment-15859210 ]
Kfir Lev-Ari edited comment on ZOOKEEPER-2684 at 2/9/17 8:41 AM: ----------------------------------------------------------------- [~nerdyyatrice], can you please describe the scenario in which the same request is processed in the queue twice? As I see it, if a request r is received from a local client, then r is added to the queue (note that r was already sent to the leader prior to that point). Once a commit arrives from the leader, r is processed, and r won't be back to the queue, regardless of a possible client disconnection (AFAIK, the connection is only needed at the end of the line, when some kind of result is returned). Now, lets say the client gets disconnected at some point in the time frame above while r is processed, and connects to some server (same server or different). If a commit arrives to a different server, r will be processed as if it belongs to a remote client, i.e., we will only perform the update, without using the connection. I'm not sure that after disconnection ZK is required to inform the client's new session on his past actions.. (but I guess it can also be fixed if needed). If a commit arrives and r is in the queue waiting for it, then it is processed as if it belongs to a local connected client, but eventually the connection handle will show that that connection ended, (if I remember the code correctly), so nothing to report, but ZK continue as usual. Note that if a client writes something with lower cxid than r, the commit processor doesn't track such a behavior, i.e., it is possible that the next head after r will have lower cxid than r. We only care about the order of commits that we receive from the leader, and that order can't be changed, because it is based on the network protocol order of messages (i.e., if r was already sent to the leader, than clearly r is committed prior to any new message of the same client). Bottom line, it seems like r is processed only once per processor. What am I missing? was (Author: kfirlevari): [~nerdyyatrice], can you please describe the scenario in which the same request is processed in the queue twice? As I see it, if a request r is received from a local client, then r is added to the queue (note that r was already sent to the leader prior to that point). Once a commit arrives from the leader, r is processed, and r won't be back to the queue, regardless of a possible client disconnection (AFAIK, the connection is only needed at the end of the line, when some kind of result is returned). Now, lets say the client gets disconnected at some point in the time frame above while r is processed, and connects to some server (same server or different). In the patch, if a commit arrives to a different server, r will be processed as if it belongs to a remote client, i.e., we will only perform the update, without using the connection. I'm not sure that after disconnection ZK is required to inform the client's new session on his past actions.. (but I guess it can also be fixed if needed). If a commit arrives and r is in the queue waiting for it, then it is processed as if it belongs to a local connected client, but eventually the connection handle will show that that connection ended, (if I remember the code correctly), so nothing to report, but ZK continue as usual. Note that if a client writes something with lower cxid than r, the commit processor doesn't track such a behavior, i.e., it is possible that the next head after r will have lower cxid than r. We only care about the order of commits that we receive from the leader, and that order can't be changed, because it is based on the network protocol order of messages (i.e., if r was already sent to the leader, than clearly r is committed prior to any new message of the same client). Bottom line, it seems like r is processed only once per processor. What am I missing? > Fix a crashing bug in the mixed workloads commit processor > ---------------------------------------------------------- > > Key: ZOOKEEPER-2684 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2684 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.6.0 > Environment: with pretty heavy load on a real cluster > Reporter: Ryan Zhang > Assignee: Ryan Zhang > Priority: Blocker > Attachments: ZOOKEEPER-2684.patch > > > We deployed our build with ZOOKEEPER-2024 and it quickly started to crash > with the following error > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:24:42,305 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x119fa expected 0x11fc5 for client session id 1009079ba470055 > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:32:04,746 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x698 expected 0x928 for client session id 4002eeb3fd0009d > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:34:46,648 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x8904 expected 0x8f34 for client session id 51b8905c90251 > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:43:46,834 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x3a8d expected 0x3ebc for client session id 2051af11af900cc > clearly something is not right in the new commit processor per session queue > implementation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)