[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474571#comment-13474571 ]
Ted Yu commented on ZOOKEEPER-1560: ----------------------------------- In doIO(), should we check the return value from: {code} sock.write(pbb); {code} Here is jstack where testLargeNodeData hung: {code} "main" prio=5 tid=7f9bed000800 nid=0x10c382000 in Object.wait() [10c380000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <7dc1a15c0> (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) - locked <7dc1a15c0> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:781) at org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:531) {code} I think we can send data in chunks if pbb.remaining() is beyond certain threshold. > Zookeeper client hangs on creation of large nodes > ------------------------------------------------- > > Key: ZOOKEEPER-1560 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 > Project: ZooKeeper > Issue Type: Bug > Components: java client > Affects Versions: 3.4.4, 3.5.0 > Reporter: Igor Motov > Attachments: ZOOKEEPER-1560.patch > > > To reproduce, try creating a node with 0.5M of data using java client. The > test will hang waiting for a response from the server. See the attached patch > for the test that reproduces the issue. > It seems that ZOOKEEPER-1437 introduced a few issues to > {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from > sending large packets that require several invocations of > {{SocketChannel.write}} to complete. The first issue is that the call to > {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue > even if the packet wasn't completely sent yet. It looks to me that this call > should be moved under {{if (!pbb.hasRemaining())}} The second issue is that > {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which > confuses {{SocketChannel.write}}. And the third issue is caused by extra > calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse > the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira