[ https://issues.apache.org/jira/browse/KAFKA-14885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zou shengfu updated KAFKA-14885: -------------------------------- Description: When a broker has some issues about network, the broker can not connect to zookeeper and controller.At this time, we replace this broker with a new broker that has a same `broker.id` with fault broker, and we can not stop the Kafka process on fault broker because of network issue. So the client can still connect this broker and can produce and consume messages normally. But the data on fault broker maybe be lost because there are some leader for partitions on the fault broker. Do we have any good idea to solve this problem? In my opinion, we can check broker configuration (for example: broker ip) when broker reconnects to zookeeper and broker can exist if broker's configuration is not same with zookeeper. But if broker can not reconnect to zookeeper successfully, maybe we need to check broker configuration between local disk and zookeeper periodically was: When a broker has some issues about network, the broker can not connect to zookeeper and controller.At this time, we replace this broker with a new broker that has a same `broker.id` with fault broker, and we can not stop the Kafka process on fault broker because of network issue. So the client can still connect this broker and can produce and consume messages normally. But the data on fault broker maybe be lost because there are some leader for partitions on the fault broker. Do we have any good idea to solve this problem? > Client can connect to broker and broker can not connect zookeeper > ----------------------------------------------------------------- > > Key: KAFKA-14885 > URL: https://issues.apache.org/jira/browse/KAFKA-14885 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 3.3.2 > Reporter: zou shengfu > Assignee: zou shengfu > Priority: Major > > When a broker has some issues about network, the broker can not connect to > zookeeper and controller.At this time, we replace this broker with a new > broker that has a same `broker.id` with fault broker, and we can not stop > the Kafka process on fault broker because of network issue. So the client can > still connect this broker and can produce and consume messages normally. But > the data on fault broker maybe be lost because there are some leader for > partitions on the fault broker. > Do we have any good idea to solve this problem? > In my opinion, we can check broker configuration (for example: broker ip) > when broker reconnects to zookeeper and broker can exist if broker's > configuration is not same with zookeeper. But if broker can not reconnect to > zookeeper successfully, maybe we need to check broker configuration between > local disk and zookeeper periodically > -- This message was sent by Atlassian Jira (v8.20.10#820010)