[ 
https://issues.apache.org/jira/browse/KAFKA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786494#comment-16786494
 ] 

ASF GitHub Bot commented on KAFKA-7974:
---------------------------------------

cmccabe commented on pull request #6305: Fix for KAFKA-7974: Avoid zombie 
AdminClient when node host isn't resolvable
URL: https://github.com/apache/kafka/pull/6305
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KafkaAdminClient loses worker thread/enters zombie state when initial DNS 
> lookup fails
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7974
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7974
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Nicholas Parker
>            Priority: Major
>
> Version: kafka-clients-2.1.0
> I have some code that creates creates a KafkaAdminClient instance and then 
> invokes listTopics(). I was seeing the following stacktrace in the logs, 
> after which the KafkaAdminClient instance became unresponsive:
> {code:java}
> ERROR [kafka-admin-client-thread | adminclient-1] 2019-02-18 01:00:45,597 
> KafkaThread.java:51 - Uncaught exception in thread 'kafka-admin-client-thread 
> | adminclient-1':
> java.lang.IllegalStateException: No entry found for connection 0
>     at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:330)
>     at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:134)
>     at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:921)
>     at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
>     at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:898)
>     at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1113)
>     at java.lang.Thread.run(Thread.java:748){code}
> From looking at the code I was able to trace down a possible cause:
>  * NetworkClient.ready() invokes this.initiateConnect() as seen in the above 
> stacktrace
>  * NetworkClient.initiateConnect() invokes 
> ClusterConnectionStates.connecting(), which internally invokes 
> ClientUtils.resolve() to to resolve the host when creating an entry for the 
> connection.
>  * If this host lookup fails, a UnknownHostException can be thrown back to 
> NetworkClient.initiateConnect() and the connection entry is not created in 
> ClusterConnectionStates. This exception doesn't get logged so this is a guess 
> on my part.
>  * NetworkClient.initiateConnect() catches the exception and attempts to call 
> ClusterConnectionStates.disconnected(), which throws an IllegalStateException 
> because no entry had yet been created due to the lookup failure.
>  * This IllegalStateException ends up killing the worker thread and 
> KafkaAdminClient gets stuck, never returning from listTopics().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to