Rui Abreu created KAFKA-9531:
--------------------------------
Summary: java.net.UnknownHostException loop on VM rolling update
using CNAME
Key: KAFKA-9531
URL: https://issues.apache.org/jira/browse/KAFKA-9531
Project: Kafka
Issue Type: Bug
Components: clients, controller, producer
Affects Versions: 2.4.0
Reporter: Rui Abreu
Hello,
My cluster setup in based on VMs behind DNS CNAME .
Example: node.internal is a CNAME to either nodeA.internal or nodeB.internal
Since kafka-client 1.2.1, it has been observed that sometimes Kafka clients
get stuck on a loop with the exception:
Example after nodeB.internal is replaced with nodeA.internal
{code:java}
2020-02-10T12:11:28.181Z o.a.k.c.NetworkClient [WARN] - [Consumer
clientId=consumer-6, groupId=consumer.group] Error connecting to node
nodeB.internal:9092 (id: 2 rack: null)
java.net.UnknownHostException: nodeB.internal:9092
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
~[?:1.8.0_222]
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
~[?:1.8.0_222]
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
~[?:1.8.0_222]
at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:104)
~[stormjar.jar:?]
at
org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403)
~[stormjar.jar:?]
at
org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363)
~[stormjar.jar:?]
at
org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151)
~[stormjar.jar:?]
at
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:943)
~[stormjar.jar:?]
at
org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:68)
~[stormjar.jar:?]
at
org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1114)
~[stormjar.jar:?]
at
org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1005)
~[stormjar.jar:?]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:537)
~[stormjar.jar:?]
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
~[stormjar.jar:?]
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
~[stormjar.jar:?]
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
~[stormjar.jar:?]
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161)
~[stormjar.jar:?]
at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:366)
~[stormjar.jar:?]
at
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
~[stormjar.jar:?]
at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1220)
~[stormjar.jar:?]
at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1159)
~[stormjar.jar:?]
at
org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:365)
~[stormjar.jar:?]
at
org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:294)
~[stormjar.jar:?]
at
org.apache.storm.daemon.executor$fn__10715$fn__10730$fn__10761.invoke(executor.clj:649)
~[storm-core-1.1.3.jar:1.1.3]
at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484)
~[storm-core-1.1.3.jar:1.1.3]
at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
{code}
The time it spends in the loop is arbitrary, but it seems the client
effectively stops while this is happening.
This error contrasts with instances where the client is able to recover on its
own after a few seconds:
{code:java}
2020-02-08T01:15:37.390Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer
clientId=consumer-7, groupId=consumer-group] Group coordinator
nodeA.internal:9092 (id: 2147483645 rack: null) is unavailable or invalid, will
attempt rediscovery
2020-02-08T01:15:37.885Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer
clientId=consumer-7, groupId=consumer-group] Discovered group coordinator
nodeB.internal:9092 (id: 2147483646 rack: null)
2020-02-08T01:15:37.885Z o.a.k.c.ClusterConnectionStates [INFO] - [Consumer
clientId=consumer-7, groupId=consumer-group] Hostname for node 2147483646
changed from nodeA.internal to nodeB.internal
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)