mmatloka commented on PR #21134: URL: https://github.com/apache/kafka/pull/21134#issuecomment-3658984359
> Thanks for the patch! I've a question: > > > The original fix depends on ELR > > I'm unsure about this. Going through the original fix, IIUC, it modifies the "unclean" check to be based on `LeaderRecoveryState.RECOVERING` [[0](https://github.com/apache/kafka/blob/95d164a3ae67b8d3860567528e2ba6ca0c532fa2/metadata/src/main/java/org/apache/kafka/metadata/PartitionRegistration.java#L166)] which was introduced with [KIP-704](https://cwiki.apache.org/confluence/display/KAFKA/KIP-704%3A+Send+a+hint+to+the+partition+leader+to+recover+the+partition) and as a result should be available with 3.9. It also modifies the calculation of `electionFromElrCounter` which was introduced in 4.1 with [KAFKA-18954](https://issues.apache.org/jira/browse/KAFKA-18954) but that I suspect that can be dropped from the backport? Hi, actually the good question is what events actually lead to the situation where adding replica becomes the leader in one step. One situation is probably when it is recovering. However, what we saw in practice, that the situation was probably(?) result of some partition re-assign (probably performed by Confluent SBC actually, new cluster, completely healthy, idle, nothing happening, suddenly adding partition becomes the leader and metrics show unclean election. maybe manual reassign could try to simulate this 🤔 ). Do the partition re-assign cause the leader go through recovering state? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
