[ https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083655#comment-15083655 ]
Powell Molleti commented on ZOOKEEPER-2186: ------------------------------------------- Markus I have come across the same issue and decided to implement this by sending the same notification. I am working on this as part of ZOOKEEPER-901, refer some of the discussions about this here ZOOKEEPER-1045. Let me know what you think about this idea?. I think this has the potential to solve the user level keep-alive implementation without the need to send new bits in hdr and/or the to introduce a new message for keep-alive. However this breaks the current FLE due to this code: http://bit.ly/1PdWY1D {code:title=FastLeaderElection.java|borderStyle=solid} // Verify if there is any change in the proposed leader while((n = recvqueue.poll(finalizeWait, TimeUnit.MILLISECONDS)) != null){ if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch, proposedLeader, proposedZxid, proposedEpoch)){ recvqueue.put(n); break; } } {code} I think this while loop is in error, if I am not mistaken, it should use a global clock limit how long to poll for rather than hoping no one is going send any messages with-in the finalizeWait time window. I am hoping to negotiate for a change here if the submitted patch is found to be reasonable. > QuorumCnxManager#receiveConnection may crash with random input > -------------------------------------------------------------- > > Key: ZOOKEEPER-2186 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.6, 3.5.0 > Reporter: Raul Gutierrez Segales > Assignee: Raul Gutierrez Segales > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-2186-v3.4.patch, ZOOKEEPER-2186.patch, > ZOOKEEPER-2186.patch, ZOOKEEPER-2186.patch > > > This will allocate an arbitrarily large byte buffer (and try to read it!): > {code} > public boolean receiveConnection(Socket sock) { > Long sid = null; > ... > sid = din.readLong(); > // next comes the #bytes in the remainder of the message > > int num_remaining_bytes = din.readInt(); > byte[] b = new byte[num_remaining_bytes]; > // remove the remainder of the message from din > > int num_read = din.read(b); > {code} > This will crash the QuorumCnxManager thread, so the cluster will keep going > but future elections might fail to converge (ditto for leaving/joining > members). > Patch coming up in a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)