[ https://issues.apache.org/jira/browse/ZOOKEEPER-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arshad Mohammad updated ZOOKEEPER-2570: --------------------------------------- Description: ZooKeeper clients are timed out when ZooKeeper servers are very busy. Clients throw below exception and fail all the pending operations {code} org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) {code} Clients log bellow information {noformat} 2016-09-22 01:49:08,001 [myid:127.0.0.1:11228] - WARN [main-SendThread(127.0.0.1:11228):ClientCnxn$SendThread@1181] - Client session timed out, have not heard from server in 13908ms for sessionid 0x20000d21b280000 2016-09-22 01:49:08,001 [myid:127.0.0.1:11228] - INFO [main-SendThread(127.0.0.1:11228):ClientCnxn$SendThread@1229] - Client session timed out, have not heard from server in 13908ms for sessionid 0x20000d21b280000, closing socket connection and attempting reconnect {noformat} *STEPS TO REPRODECE:* # Create multi operation {code} List<Op> ops = new ArrayList<Op>(); for (int i = 0; i < N; i++) { Op create = Op.create(rootNode + "/" + i, "".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); ops.add(create); } {code} Chose N in such a way that the total multi operation request size is less than but near 1 MB. For bigger request size increase jute.maxbuffer in servers # Submit the multi operation request {code} zooKeeper.multi(ops);{code} # After repeating above steps few times issue is reproduced was: ZooKeeper server expires the client session when server is continuously under higher load. Below steps can reproduce the issue # Create multi operation {code} List<Op> ops = new ArrayList<Op>(); for (int i = 0; i < N; i++) { Op create = Op.create(rootNode + "/" + i, "".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); ops.add(create); } {code} Chose N in such a way that the total multi operation request size is less than but near 1 MB. For bigger request size increase jute.maxbuffer in servers # Submit the multi operation request {code} zooKeeper.multi(ops);{code} # After repeating above steps few times client throws {{ConnectionLossException}} and at server one can find log "Expiring session 0x100b0ff5ecc0003, timeout of xxxxms exceeded" Normally server expires session when it is not receiving ping from the client for longer than the client's session time-out. But in this case client is continuously doing operation with the server. So server should not expire the session. > ZooKeeper clients are timed out when ZooKeeper servers are very busy > -------------------------------------------------------------------- > > Key: ZOOKEEPER-2570 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2570 > Project: ZooKeeper > Issue Type: Bug > Reporter: Arshad Mohammad > Assignee: Arshad Mohammad > Priority: Critical > > ZooKeeper clients are timed out when ZooKeeper servers are very busy. Clients > throw below exception and fail all the pending operations > {code} > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > {code} > Clients log bellow information > {noformat} > 2016-09-22 01:49:08,001 [myid:127.0.0.1:11228] - WARN > [main-SendThread(127.0.0.1:11228):ClientCnxn$SendThread@1181] - Client > session timed out, have not heard from server in 13908ms for sessionid > 0x20000d21b280000 > 2016-09-22 01:49:08,001 [myid:127.0.0.1:11228] - INFO > [main-SendThread(127.0.0.1:11228):ClientCnxn$SendThread@1229] - Client > session timed out, have not heard from server in 13908ms for sessionid > 0x20000d21b280000, closing socket connection and attempting reconnect > {noformat} > *STEPS TO REPRODECE:* > # Create multi operation > {code} > List<Op> ops = new ArrayList<Op>(); > for (int i = 0; i < N; i++) { > Op create = Op.create(rootNode + "/" + i, "".getBytes(), > ZooDefs.Ids.OPEN_ACL_UNSAFE, > CreateMode.PERSISTENT); > ops.add(create); > } > {code} > Chose N in such a way that the total multi operation request size is less > than but near 1 MB. For bigger request size increase jute.maxbuffer in > servers > # Submit the multi operation request > {code} zooKeeper.multi(ops);{code} > # After repeating above steps few times issue is reproduced -- This message was sent by Atlassian JIRA (v6.3.4#6332)