[
https://issues.apache.org/jira/browse/KAFKA-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077839#comment-17077839
]
Jordan Moore commented on KAFKA-9829:
-------------------------------------
I feel like autoscaling Zookeeper or Kafka is a bad idea.
You need at least 3 healthy ZK at all time to be considered "stable" and you
can lose only one for fault tolerance.
Similarly, unless you have some external script around partition migration,
horizontal scaling of brokers isn't possible.
> Kafka brokers are unregistered on Zookeeper node replacement
> ------------------------------------------------------------
>
> Key: KAFKA-9829
> URL: https://issues.apache.org/jira/browse/KAFKA-9829
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.10.2.1
> Reporter: Pradeep
> Priority: Major
>
> We have a Kafka cluster with 3 nodes connected to a Zookeeper (3.4.14)
> cluster of 3 nodes in AWS. We make use of the auto-scaling group to provision
> nodes upon failures. We are seeing an issue where the Kafka brokers are
> getting un-registered when all the Zookeeper nodes are replaced one after the
> other. Every Zookeeper node is terminated from AWS console and we wait for a
> replacement node to be provisioned with Zookeeper initialized before
> terminating the other node.
> On every Zookeeper node replacement, the /broker/ids path show all the Kafka
> broker ids in the cluster. But only on the final Zookeeper node replacement,
> the content in /broker/ids become empty. Because of this issue we are not
> able to create any new topic or do any other operations.
> We are seeing below logs in one of the replaced Zookeeper nodes when all of
> the original nodes are replaced.
> {{2020-03-26 20:29:20,303 [myid:3] - INFO
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
> Expiring session 0x10003b973b50016, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,303 [myid:3] - INFO
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
> Expiring session 0x10003b973b5000e, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,303 [myid:3] - INFO
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
> Expiring session 0x30003a126690002, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /brokers/ids/1002 for session 0x10003b973b50016}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /brokers/ids/1003 for session 0x10003b973b5000e}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /controller for session 0x30003a126690002}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /brokers/ids/1001 for session 0x30003a126690002}}
>
> I am not sure if the issue is related to KAFKA-5473.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)