[
https://issues.apache.org/jira/browse/KAFKA-12513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krzysztof Piecuch resolved KAFKA-12513.
---------------------------------------
Resolution: Invalid
I've just read the docs, looks like everything is fine on kafka & zookeeper
side.
sorry for the confusion.
> Kafka zookeeper client can't connect when the first zookeeper server is
> offline
> -------------------------------------------------------------------------------
>
> Key: KAFKA-12513
> URL: https://issues.apache.org/jira/browse/KAFKA-12513
> Project: Kafka
> Issue Type: Bug
> Components: zkclient
> Affects Versions: 2.3.1, 2.4.1, 2.7.0
> Environment: kafka_2.13-2.7.0, kernel 5.4.0-52-generic (Ubuntu),
> Scala 2.13.3-400
> Reporter: Krzysztof Piecuch
> Priority: Critical
>
> Kafka zookeeper client library will not connect to any zookeepers in the
> "zookeeper string" when the first zookeeper is offline. This causes the
> cluster to crash hard and in order to get the cluster back into healthy state
> the first zookeeper node must be resurrected.
> The crash does not always happen immediately after zk0 goes offline, because
> kafka might have connections established to different zookeeper instances.
> When the connection gets dropped and kafka needs to reconnect everything
> crashes hard.
>
> Demo:
> This works:
> {code:java}
> root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper
> zk0.gambit:2181/hex8c,zk1.gambit:2181/hex8c,zk2.gambit:2181/hex8c --describe
> --topic duma
> Topic: duma PartitionCount: 6 ReplicationFactor: 3 Configs:
> compression.type=uncompressed,retention.bytes=322122547200
> Topic: duma Partition: 0 Leader: 1 Replicas: 1,0,2 Isr:
> 1,0,2
> Topic: duma Partition: 1 Leader: 2 Replicas: 2,1,0 Isr:
> 0,1,2
> Topic: duma Partition: 2 Leader: 0 Replicas: 0,2,1 Isr:
> 0,1,2
> Topic: duma Partition: 3 Leader: 1 Replicas: 1,2,0 Isr:
> 1,0,2
> Topic: duma Partition: 4 Leader: 2 Replicas: 2,0,1 Isr:
> 1,0,2
> Topic: duma Partition: 5 Leader: 0 Replicas: 0,1,2 Isr:
> 0,1,2
> {code}
> Now let's mess with the zookeeper string and see how zookeeper client reacts:
> Changing the last server in the zookeeper string works as expected,
> {{kafka-topics.sh}} connected to zookeeper but couldn't find the topic
> (because of bogus zookeeper string):
> {code:java}
> root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper
> zk0.gambit:2181/hex8c,zk1.gambit:2181/hex8c,1.1.1.1:2181/hex8c --describe
> --topic duma
> Error while executing topic command : Topic 'duma' does not exist as expected
> [2021-03-20 23:01:45,535] ERROR java.lang.IllegalArgumentException: Topic
> 'duma' does not exist as expected
> at
> kafka.admin.TopicCommand$.kafka$admin$TopicCommand$$ensureTopicExists(TopicCommand.scala:484)
> at
> kafka.admin.TopicCommand$ZookeeperTopicService.describeTopic(TopicCommand.scala:390)
> at kafka.admin.TopicCommand$.main(TopicCommand.scala:67)
> at kafka.admin.TopicCommand.main(TopicCommand.scala)
> (kafka.admin.TopicCommand$) {code}
> However, in case the first server in the zookeeper cluster is unavailable
> zookeeper client won't connect to any of the zookeepers:
> {code:java}
> root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper
> 1.1.1.1:2181/hex8c,zk1.gambit:2181/hex8c,zk2.gambit:2181/hex8c --describe
> --topic duma
> [2021-03-20 23:02:43,888] WARN Client session timed out, have not heard from
> server in 30012ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
> Exception in thread "main" kafka.zookeeper.ZooKeeperClientTimeoutException:
> Timed out waiting for connection while in state: CONNECTING
> at
> kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:259)
> at
> kafka.zookeeper.ZooKeeperClient$$Lambda$31.000000005D399170.apply$mcV$sp(Unknown
> Source)
> at
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> at
> kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:255)
> at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:113)
> at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1858)
> at
> kafka.admin.TopicCommand$ZookeeperTopicService$.apply(TopicCommand.scala:321)
> at kafka.admin.TopicCommand$.main(TopicCommand.scala:54)
> at kafka.admin.TopicCommand.main(TopicCommand.scala) {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)