[ https://issues.apache.org/jira/browse/KAFKA-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615172#comment-16615172 ]
ASF GitHub Bot commented on KAFKA-7414: --------------------------------------- hachikuji opened a new pull request #5654: KAFKA-7414; Out of range errors should never be fatal for follower URL: https://github.com/apache/kafka/pull/5654 This patch fixes the inconsistent handling of out of range errors in the replica fetcher. Previously we would raise a fatal error if the follower's offset is ahead of the leader's and unclean leader election is not enabled. The behavior was inconsistent depending on the message format. With KIP-101/KIP-279, upon becoming a follower, the replica would use leader epoch information to reconcile the end of the log with the leader and simply truncate. Additionally, with the old format, the check is not really bulletproof for detecting data loss since the unclean leader's end offset might have already caught up to the follower's offset at the time of its initial fetch or when it queries for the current log end offset. With this patch, we simply skip the unclean leader election check and allow the needed truncation to occur. When the truncation offset is below the high watermark, a warning will be logged. This makes the behavior consistent for all message formats and removes a scenario in which an error on one partition can bring the broker down. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Do not fail broker on out of range offsets in replica fetcher > ------------------------------------------------------------- > > Key: KAFKA-7414 > URL: https://issues.apache.org/jira/browse/KAFKA-7414 > Project: Kafka > Issue Type: Improvement > Components: replication > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Major > > In the replica fetcher, we have logic to detect the case when the follower's > offset is ahead of the leader's. If unclean leader election is not enabled, > we raise a fatal error and kill the broker. > This behavior is inconsistent depending on the message format. With > KIP-101/KIP-279, upon becoming a follower, the replica would use leader epoch > information to reconcile the end of the log with the leader and simply > truncate. Additionally, with the old format, the check is not really > bulletproof for detecting data loss since the unclean leader's end offset > might have already caught up to the follower's offset at the time of its > initial fetch or when it queries for the current log end offset. > To make the logic consistent, we could raise a fatal error whenever the > follower has to truncate below the high watermark. However, the fatal error > is probably overkill and it would be better to log a warning since most of > the damage is already done if the leader has already been elected and this > causes a huge blast radius. -- This message was sent by Atlassian JIRA (v7.6.3#76005)