Rajini Sivaram created KAFKA-12948: -------------------------------------- Summary: NetworkClient.close(node) with node in connecting state makes NetworkClient unusable Key: KAFKA-12948 URL: https://issues.apache.org/jira/browse/KAFKA-12948 Project: Kafka Issue Type: Bug Components: network Affects Versions: 2.7.1, 2.8.0 Reporter: Rajini Sivaram Assignee: Rajini Sivaram Fix For: 2.7.2, 2.8.1
`NetworkClient.close(node)` closes the node and removes it from `ClusterConnectionStates.nodeState`, but not from `ClusterConnectionStates.connectingNodes`. Subsequent `NetworkClient.poll()` invocations throw IllegalStateException and this leaves the NetworkClient in an unusable state until the node is removed from connectionNodes or added to nodeState. We don't use `NetworkClient.close(node)` in clients, but we use it in clients started by brokers for replica fetcher and controller. Since brokers use NetworkClientUtils.isReady() before establishing connections and this invokes poll(), the NetworkClient never recovers. Exception stack trace: {code:java} java.lang.IllegalStateException: No entry found for connection 0 at org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:409) at org.apache.kafka.clients.ClusterConnectionStates.isConnectionSetupTimeout(ClusterConnectionStates.java:446) at org.apache.kafka.clients.ClusterConnectionStates.lambda$nodesWithConnectionSetupTimeout$0(ClusterConnectionStates.java:458) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174) at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1553) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.kafka.clients.ClusterConnectionStates.nodesWithConnectionSetupTimeout(ClusterConnectionStates.java:459) at org.apache.kafka.clients.NetworkClient.handleTimedOutConnections(NetworkClient.java:807) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:564) at org.apache.kafka.clients.NetworkClientUtils.isReady(NetworkClientUtils.java:42) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)