[
https://issues.apache.org/jira/browse/KAFKA-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077359#comment-17077359
]
Pradeep commented on KAFKA-9829:
--------------------------------
We made use of hostname instead of IP address. These hostnames are configured
in Route53.
> Kafka brokers are unregistered on Zookeeper node replacement
> ------------------------------------------------------------
>
> Key: KAFKA-9829
> URL: https://issues.apache.org/jira/browse/KAFKA-9829
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.10.2.1
> Reporter: Pradeep
> Priority: Major
>
> We have a Kafka cluster with 3 nodes connected to a Zookeeper (3.4.14)
> cluster of 3 nodes in AWS. We make use of the auto-scaling group to provision
> nodes upon failures. We are seeing an issue where the Kafka brokers are
> getting un-registered when all the Zookeeper nodes are replaced one after the
> other. Every Zookeeper node is terminated from AWS console and we wait for a
> replacement node to be provisioned with Zookeeper initialized before
> terminating the other node.
> On every Zookeeper node replacement, the /broker/ids path show all the Kafka
> broker ids in the cluster. But only on the final Zookeeper node replacement,
> the content in /broker/ids become empty. Because of this issue we are not
> able to create any new topic or do any other operations.
> We are seeing below logs in one of the Zookeeper nodes when all of the
> original nodes are replaced.
> {{2020-03-26 20:29:20,303 [myid:3] - INFO
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
> Expiring session 0x10003b973b50016, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,303 [myid:3] - INFO
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
> Expiring session 0x10003b973b5000e, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,303 [myid:3] - INFO
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
> Expiring session 0x30003a126690002, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /brokers/ids/1002 for session 0x10003b973b50016}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /brokers/ids/1003 for session 0x10003b973b5000e}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /controller for session 0x30003a126690002}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /brokers/ids/1001 for session 0x30003a126690002}}
>
> I am not sure if the issue is related to KAFKA-5473.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)