[
https://issues.apache.org/jira/browse/KAFKA-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pradeep updated KAFKA-9829:
---------------------------
Description:
We have a Kafka cluster with 3 nodes connected to a Zookeeper (3.4.14) cluster
of 3 nodes in AWS. We make use of the auto-scaling group to provision nodes
upon failures. We are seeing an issue where the Kafka brokers are getting
un-registered when all the Zookeeper nodes are replaced one after the other.
Every Zookeeper node is terminated from AWS console and we wait for a
replacement node to be provisioned with Zookeeper initialized before
terminating the other node.
On every Zookeeper node replacement, the /broker/ids path show all the Kafka
broker ids in the cluster. But only on the final Zookeeper node replacement,
the content in /broker/ids become empty. Because of this issue we are not able
to create any new topic or do any other operations.
We are seeing below logs in one of the Zookeeper nodes when all of the original
nodes are replaced.
{{2020-03-26 20:29:20,303 [myid:3] - INFO
[[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
Expiring session 0x10003b973b50016, timeout of 6000ms exceeded}}
{{2020-03-26 20:29:20,303 [myid:3] - INFO
[[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
Expiring session 0x10003b973b5000e, timeout of 6000ms exceeded}}
{{2020-03-26 20:29:20,303 [myid:3] - INFO
[[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
Expiring session 0x30003a126690002, timeout of 6000ms exceeded}}
{{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
Deleting ephemeral node /brokers/ids/1002 for session 0x10003b973b50016}}
{{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
Deleting ephemeral node /brokers/ids/1003 for session 0x10003b973b5000e}}
{{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
Deleting ephemeral node /controller for session 0x30003a126690002}}
{{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
Deleting ephemeral node /brokers/ids/1001 for session 0x30003a126690002}}
I am not sure if the issue is related to KAFKA-5473.
was:
We have a Kafka cluster with 3 nodes connected to a Zookeeper (3.4.14) cluster
of 3 nodes in AWS. We make use of the auto-scaling group to provision nodes
upon failures. We are seeing an issue where the Kafka brokers are getting
un-registered when all the Zookeeper nodes are replaced one after the other.
Every Zookeeper node is terminated from AWS console and we wait for a
replacement node to be provisioned with Zookeeper initialized before
terminating the other node.
On every Zookeeper node replacement, the /broker/ids path show all the Kafka
broker ids in the cluster. But only on the final Zookeeper node replacement,
the content in /broker/ids become empty.
We are seeing below logs in one of the Zookeeper nodes when all of the original
nodes are replaced.
{{2020-03-26 20:29:20,303 [myid:3] - INFO
[[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
Expiring session 0x10003b973b50016, timeout of 6000ms exceeded}}
{{2020-03-26 20:29:20,303 [myid:3] - INFO
[[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
Expiring session 0x10003b973b5000e, timeout of 6000ms exceeded}}
{{2020-03-26 20:29:20,303 [myid:3] - INFO
[[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
Expiring session 0x30003a126690002, timeout of 6000ms exceeded}}
{{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
Deleting ephemeral node /brokers/ids/1002 for session 0x10003b973b50016}}
{{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
Deleting ephemeral node /brokers/ids/1003 for session 0x10003b973b5000e}}
{{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
Deleting ephemeral node /controller for session 0x30003a126690002}}
{{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
Deleting ephemeral node /brokers/ids/1001 for session 0x30003a126690002}}
I am not sure if the issue is related to KAFKA-5473.
> Kafka brokers are un-registered on Zookeeper node replacement
> -------------------------------------------------------------
>
> Key: KAFKA-9829
> URL: https://issues.apache.org/jira/browse/KAFKA-9829
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.10.2.1
> Reporter: Pradeep
> Priority: Major
>
> We have a Kafka cluster with 3 nodes connected to a Zookeeper (3.4.14)
> cluster of 3 nodes in AWS. We make use of the auto-scaling group to provision
> nodes upon failures. We are seeing an issue where the Kafka brokers are
> getting un-registered when all the Zookeeper nodes are replaced one after the
> other. Every Zookeeper node is terminated from AWS console and we wait for a
> replacement node to be provisioned with Zookeeper initialized before
> terminating the other node.
> On every Zookeeper node replacement, the /broker/ids path show all the Kafka
> broker ids in the cluster. But only on the final Zookeeper node replacement,
> the content in /broker/ids become empty. Because of this issue we are not
> able to create any new topic or do any other operations.
> We are seeing below logs in one of the Zookeeper nodes when all of the
> original nodes are replaced.
> {{2020-03-26 20:29:20,303 [myid:3] - INFO
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
> Expiring session 0x10003b973b50016, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,303 [myid:3] - INFO
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
> Expiring session 0x10003b973b5000e, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,303 [myid:3] - INFO
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] -
> Expiring session 0x30003a126690002, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /brokers/ids/1002 for session 0x10003b973b50016}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /brokers/ids/1003 for session 0x10003b973b5000e}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /controller for session 0x30003a126690002}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] -
> Deleting ephemeral node /brokers/ids/1001 for session 0x30003a126690002}}
>
> I am not sure if the issue is related to KAFKA-5473.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)