FrankYang0529 opened a new pull request, #19240:
URL: https://github.com/apache/kafka/pull/19240

   There're two root causes:
   1. When we unclean shutdown `brokerToBeTheLeader`, we didn't wait for the 
result. That means when we send heartbeat to unfence broker, it has chance to 
use stale broker epoch to send the request. [0]
   2. We use different replica directory to unclean shutdown broker. Even if 
broker is unfenced, it cannot get an online directory, so the 
`brokerToBeTheLeader` cannot be elected as a new leader. [1]
   
   
   [0] 
https://github.com/apache/kafka/blob/a5325e029e2493f22925af99482ad9fa1eb06947/metadata/src/test/java/org/apache/kafka/controller/QuorumControllerTest.java#L484-L497
   
   [1] 
https://github.com/apache/kafka/blob/a5325e029e2493f22925af99482ad9fa1eb06947/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L2470-L2477
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to