[ https://issues.apache.org/jira/browse/ZOOKEEPER-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869349#comment-15869349 ]
Ryan Zhang commented on ZOOKEEPER-2684: --------------------------------------- so here is what actually happened On the server A, the request is processed right after the session connection is closed and the request is forwarded to the lead investigate.log:2017-02-14 23:30:50,373 - INFO [FollowerRequestProcessor:4] - Processing request:: sessionid:0x4006b1b17808b0c type:getData cxid:0x14 zxid:0xfffffffffffffffe txntype:unknown reqpath:/loadgen/load_1775503086 investigate.log:2017-02-14 23:30:54,327 - INFO [FollowerRequestProcessor:4] - Processing request:: sessionid:0x4006b1b17808b0c type:getData cxid:0x15 zxid:0xfffffffffffffffe txntype:unknown reqpath:/loadgen/load_1775503083 investigate.log:2017-02-14 23:30:57,888 - INFO [FollowerRequestProcessor:4] - Processing request:: sessionid:0x4006b1b17808b0c type:getData cxid:0x16 zxid:0xfffffffffffffffe txntype:unknown reqpath:/loadgen/load_1775503084 investigate.log:2017-02-14 23:31:01,583 - INFO [FollowerRequestProcessor:4] - Processing request:: sessionid:0x4006b1b17808b0c type:getData cxid:0x17 zxid:0xfffffffffffffffe txntype:unknown reqpath:/loadgen/load_1775493180 investigate.log:2017-02-14 23:31:04,867 - INFO [FollowerRequestProcessor:4] - Processing request:: sessionid:0x4006b1b17808b0c type:getData cxid:0x18 zxid:0xfffffffffffffffe txntype:unknown reqpath:/loadgen/load_1775493181 investigate.log:2017-02-14 23:31:08,150 - INFO [FollowerRequestProcessor:4] - Processing request:: sessionid:0x4006b1b17808b0c type:getData cxid:0x19 zxid:0xfffffffffffffffe txntype:unknown reqpath:/loadgen/load_1775985079 investigate.log:2017-02-14 23:31:12,672 - INFO [FollowerRequestProcessor:4] - Processing request:: sessionid:0x4006b1b17808b0c type:getChildren2 cxid:0x1a zxid:0xfffffffffffffffe txntype:unknown reqpath:/loadgen investigate.log:2017-02-14 23:31:17,293 - WARN [NIOWorkerThread-16] - Exception causing close of session 0x4006b1b17808b0c: Broken pipe investigate.log:2017-02-14 23:31:17,297 - INFO [FollowerRequestProcessor:4] - Processing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0xfffffffffffffffe txntype:unknown reqpath:/loadgen/load_1776069644 investigate.log:2017-02-14 23:31:21,816 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x27 zxid:0x385012a1e94 txntype:2 reqpath:n/a investigate.log:2017-02-14 23:31:21,816 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x28 zxid:0x385012a1e95 txntype:1 reqpath:n/a investigate.log:2017-02-14 23:31:21,816 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x29 zxid:0x385012a1e96 txntype:2 reqpath:n/a investigate.log:2017-02-14 23:31:21,816 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x2a zxid:0x385012a1e98 txntype:1 reqpath:n/a investigate.log:2017-02-14 23:31:21,817 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x2b zxid:0x385012a1e9b txntype:2 reqpath:n/a investigate.log:2017-02-14 23:31:21,817 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x2c zxid:0x385012a1e9c txntype:1 reqpath:n/a investigate.log:2017-02-14 23:31:21,818 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x2d zxid:0x385012a1ea1 txntype:2 reqpath:n/a investigate.log:2017-02-14 23:31:21,822 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x2e zxid:0x385012a1eb8 txntype:1 reqpath:n/a investigate.log:2017-02-14 23:31:21,824 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x2f zxid:0x385012a1ec0 txntype:2 reqpath:n/a investigate.log:2017-02-14 23:31:21,833 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x30 zxid:0x385012a1ee9 txntype:1 reqpath:n/a investigate.log:2017-02-14 23:31:22,048 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:error cxid:0x1b zxid:0x385012a2156 txntype:-1 reqpath:n/a investigate.log:2017-02-14 23:31:22,690 - ERROR [CommitProcessor:4] - Got request sessionid:0x4006b1b17808b0c type:delete cxid:0x27 zxid:0x385012a1e94 txntype:2 reqpath:n/a investigate.log: but we are expecting request sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0xfffffffffffffffe txntype:unknown reqpath:/loadgen/load_1776069644 investigate.log:2017-02-14 23:32:39,172 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x4c zxid:0x385012b34fa txntype:1 reqpath:n/a investigate.log:2017-02-14 23:32:48,677 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x4e zxid:0x385012b6c31 txntype:2 reqpath:n/a investigate.log:2017-02-14 23:32:54,580 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)] - Committing request:: se while on the lead side, the requests are arrived out of order (in terms of Cxid) 2017-02-14 23:31:21,822 - INFO [LearnerHandler-/10.45.111.116:53431] - Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x2e zxid:0x385012a1eb8 txntype:1 reqpath:n/a 2017-02-14 23:31:21,824 - INFO [LearnerHandler-/10.45.111.116:53431] - Commit a proposal: sessionid:0x4006b1b17808b0c type:delete cxid:0x2f zxid:0x385012a1ec0 txntype:2 reqpath:n/a 2017-02-14 23:31:21,824 - INFO [LearnerHandler-/10.45.111.116:53431] - Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x2f zxid:0x385012a1ec0 txntype:2 reqpath:n/a 2017-02-14 23:31:21,833 - INFO [LearnerHandler-/10.45.111.116:53431] - Commit a proposal: sessionid:0x4006b1b17808b0c type:create cxid:0x30 zxid:0x385012a1ee9 txntype:1 reqpath:n/a 2017-02-14 23:31:21,833 - INFO [LearnerHandler-/10.45.111.116:53431] - Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x30 zxid:0x385012a1ee9 txntype:1 reqpath:n/a 2017-02-14 23:31:21,999 - DEBUG [ProcessThread(sid:1 cport:-1):] - Got user-level KeeperException when processing sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0x385012a2156 txntype:-1 reqpath:/loadgen/load_1776069644 Error Path:null Error:KeeperErrorCode = Session moved investigate.log.1:2017-02-14 23:31:21,999 - INFO [ProcessThread(sid:1 cport:-1):] - Processing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0x385012a2156 txntype:-1 reqpath:/loadgen/load_1776069644 investigate.log.1:2017-02-14 23:31:21,999 - INFO [ProcessThread(sid:1 cport:-1):] - Leader proposing: sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0x385012a2156 txntype:-1 reqpath:/loadgen/load_1776069644 investigate.log.1:2017-02-14 23:31:22,048 - INFO [LearnerHandler-/10.45.112.127:53292] - Commit a proposal: sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0x385012a2156 txntype:-1 reqpath:/loadgen/load_1776069644 investigate.log.1:2017-02-14 23:31:22,048 - INFO [LearnerHandler-/10.45.112.127:53292] - Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0x385012a2156 txntype:-1 reqpath:/loadgen/load_1776069644 2017-02-14 23:31:26,728 - INFO [ProcessThread(sid:1 cport:-1):] - Processing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x32 zxid:0x385012a3bd0 txntype:2 reqpath:n/a 2017-02-14 23:31:26,728 - INFO [ProcessThread(sid:1 cport:-1):] - Leader proposing: sessionid:0x4006b1b17808b0c type:delete cxid:0x32 zxid:0x385012a3bd0 txntype:2 reqpath:n/a 2017-02-14 23:31:26,728 - INFO [ProcessThread(sid:1 cport:-1):] - Processing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x33 zxid:0x385012a3bd1 txntype:1 reqpath:n/a Ideally, I would like to check if the server still owns the session (has a connection open) but it seems there is no easy way to check that. Any thoughts? The worst case is just be like the un-patched version to not check at all... > Fix a crashing bug in the mixed workloads commit processor > ---------------------------------------------------------- > > Key: ZOOKEEPER-2684 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2684 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.6.0 > Environment: with pretty heavy load on a real cluster > Reporter: Ryan Zhang > Assignee: Ryan Zhang > Priority: Blocker > Attachments: ZOOKEEPER-2684.patch > > > We deployed our build with ZOOKEEPER-2024 and it quickly started to crash > with the following error > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:24:42,305 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x119fa expected 0x11fc5 for client session id 1009079ba470055 > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:32:04,746 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x698 expected 0x928 for client session id 4002eeb3fd0009d > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:34:46,648 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x8904 expected 0x8f34 for client session id 51b8905c90251 > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:43:46,834 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x3a8d expected 0x3ebc for client session id 2051af11af900cc > clearly something is not right in the new commit processor per session queue > implementation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)