[ https://issues.apache.org/jira/browse/KAFKA-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956915#comment-15956915 ]
ASF GitHub Bot commented on KAFKA-5014: --------------------------------------- GitHub user ijuma opened a pull request: https://github.com/apache/kafka/pull/2813 KAFKA-5014: NetworkClient.leastLoadedNode should check if channel is ready You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijuma/kafka kafka-5014-least-loaded-node-should-check-if-channel-is-ready Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2813.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2813 ---- commit a418b0157eab6740967852ea601e1f94f30cf630 Author: Ismael Juma <ism...@juma.me.uk> Date: 2017-04-05T14:13:16Z KAFKA-5014: NetworkClient.leastLoadedNode should check if channel is ready ---- > SSL Channel not ready but tcp is established and the server is hung will not > sending metadata > --------------------------------------------------------------------------------------------- > > Key: KAFKA-5014 > URL: https://issues.apache.org/jira/browse/KAFKA-5014 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.9.0.1, 0.10.2.0 > Reporter: Pengwei > Priority: Minor > Fix For: 0.11.0.0 > > > In our test env, QA hang one of the connecting broker of the producer, then > the producer will be stuck in send method, and throw the exception: fail to > update metadata after request timeout. > I found the reason as follow: when the producer chose one of the broker to > send metadata, it connect to the broker, but the broker is hang, the tcp is > connected and Network client marks this broker is connected, but the SSL > channel is not ready yet so the channel is not ready. > Then the Network client chooses the connected node in the leastLoadedNode > every time to send the metadata, but the node's channel is not ready yet. > So the producer stuck in getting metadata and will not try another node to > request metadata. The client should not stuck only one node is hung -- This message was sent by Atlassian JIRA (v6.3.15#6346)