FrankYang0529 opened a new pull request, #19240: URL: https://github.com/apache/kafka/pull/19240
There're two root causes: 1. When we unclean shutdown `brokerToBeTheLeader`, we didn't wait for the result. That means when we send heartbeat to unfence broker, it has chance to use stale broker epoch to send the request. [0] 2. We use different replica directory to unclean shutdown broker. Even if broker is unfenced, it cannot get an online directory, so the `brokerToBeTheLeader` cannot be elected as a new leader. [1] [0] https://github.com/apache/kafka/blob/a5325e029e2493f22925af99482ad9fa1eb06947/metadata/src/test/java/org/apache/kafka/controller/QuorumControllerTest.java#L484-L497 [1] https://github.com/apache/kafka/blob/a5325e029e2493f22925af99482ad9fa1eb06947/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L2470-L2477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
