[ 
https://issues.apache.org/jira/browse/KAFKA-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078308#comment-17078308
 ] 

Pradeep commented on KAFKA-9829:
--------------------------------

Hi [~cricket007] - We are doing high availability testing to validate if the 
Kafka cluster is operational on a complete replacement of Zookeeper nodes. Note 
that we are not replacing all the zookeeper nodes at once. We wait for a 
sufficient amount of time for the Zookeeper nodes to synchronize before 
replacing the other one.

> Kafka brokers are unregistered on Zookeeper node replacement
> ------------------------------------------------------------
>
>                 Key: KAFKA-9829
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9829
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.2.1
>            Reporter: Pradeep
>            Priority: Major
>
> We have a Kafka cluster with 3 nodes connected to a Zookeeper (3.4.14) 
> cluster of 3 nodes in AWS. We make use of the auto-scaling group to provision 
> nodes upon failures. We are seeing an issue where the Kafka brokers are 
> getting un-registered when all the Zookeeper nodes are replaced one after the 
> other. Every Zookeeper node is terminated from AWS console and we wait for a 
> replacement node to be provisioned with Zookeeper initialized before 
> terminating the other node.
> On every Zookeeper node replacement, the /broker/ids path show all the Kafka 
> broker ids in the cluster. But only on the final Zookeeper node replacement, 
> the content in /broker/ids become empty. Because of this issue we are not 
> able to create any new topic or do any other operations.
> We are seeing below logs in one of the replaced Zookeeper nodes when all of 
> the original nodes are replaced.
> {{2020-03-26 20:29:20,303 [myid:3] - INFO 
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] - 
> Expiring session 0x10003b973b50016, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,303 [myid:3] - INFO 
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] - 
> Expiring session 0x10003b973b5000e, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,303 [myid:3] - INFO 
> [[SessionTracker:ZooKeeperServer@355|sessiontracker:ZooKeeperServer@355]] - 
> Expiring session 0x30003a126690002, timeout of 6000ms exceeded}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] - 
> Deleting ephemeral node /brokers/ids/1002 for session 0x10003b973b50016}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] - 
> Deleting ephemeral node /brokers/ids/1003 for session 0x10003b973b5000e}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] - 
> Deleting ephemeral node /controller for session 0x30003a126690002}}
> {{2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] - 
> Deleting ephemeral node /brokers/ids/1001 for session 0x30003a126690002}}
>  
> I am not sure if the issue is related to KAFKA-5473.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to