Nicholas Parker created KAFKA-7974:
--------------------------------------
Summary: KafkaAdminClient loses worker thread/enters zombie state
when initial DNS lookup fails
Key: KAFKA-7974
URL: https://issues.apache.org/jira/browse/KAFKA-7974
Project: Kafka
Issue Type: Bug
Reporter: Nicholas Parker
Version: kafka-clients-2.1.0
I have some code that creates creates a KafkaAdminClient instance and then
invokes listTopics(). I was seeing the following stacktrace in the logs, after
which the KafkaAdminClient instance became unresponsive:
{code:java}
ERROR [kafka-admin-client-thread | adminclient-1] 2019-02-18 01:00:45,597
KafkaThread.java:51 - Uncaught exception in thread 'kafka-admin-client-thread |
adminclient-1':
java.lang.IllegalStateException: No entry found for connection 0
at
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:330)
at
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:134)
at
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:921)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
at
org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:898)
at
org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1113)
at java.lang.Thread.run(Thread.java:748){code}
>From looking at the code I was able to trace down a possible cause:
* NetworkClient.ready() invokes this.initiateConnect() as seen in the above
stacktrace
* NetworkClient.initiateConnect() invokes
ClusterConnectionStates.connecting(), which internally invokes
ClientUtils.resolve() to to resolve the host when creating an entry for the
connection.
* If this host lookup fails, a UnknownHostException can be thrown back to
NetworkClient.initiateConnect() and the connection entry is not created in
ClusterConnectionStates. This exception doesn't get logged so this is a guess
on my part.
* NetworkClient.initiateConnect() catches the exception and attempts to call
ClusterConnectionStates.disconnected(), which throws an IllegalStateException
because no entry had yet been created due to the lookup failure.
* This IllegalStateException ends up killing the worker thread and
KafkaAdminClient gets stuck, never returning from listTopics().
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)