[ 
https://issues.apache.org/jira/browse/KAFKA-14666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716280#comment-17716280
 ] 

Chris Egerton commented on KAFKA-14666:
---------------------------------------

[~mimaison] I believe this should be a release blocker. We don't necessarily 
have to merge the fix associated with this issue for 3.5.0, but the alternative 
would be to revert several other improvements and fixes we've made to MM2 that, 
while useful, exacerbated the impact of this issue.

I've been reviewing the PR more closely over the past few days with the goal of 
merging either today or tomorrow (day of the 3.5.0 code freeze deadline). I've 
just approved it and am waiting on the CI build to complete. Are you okay with 
backporting this to the 3.5 branch if CI goes well?

> MM2 should translate consumer group offsets behind replication flow
> -------------------------------------------------------------------
>
>                 Key: KAFKA-14666
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14666
>             Project: Kafka
>          Issue Type: Improvement
>          Components: mirrormaker
>            Reporter: Greg Harris
>            Assignee: Greg Harris
>            Priority: Major
>             Fix For: 3.5.0
>
>
> MirrorMaker2 includes an offset translation feature which can translate the 
> offsets for an upstream consumer group to a corresponding downstream consumer 
> group. It does this by keeping a topic of offset-syncs to correlate upstream 
> and downstream offsets, and translates any source offsets which are ahead of 
> the replication flow.
> However, if a replication flow is closer to the end of a topic than the 
> consumer group, then the offset translation feature will refuse to translate 
> the offset for correctness reasons. This is because the MirrorCheckpointTask 
> only keeps the latest offset correlation between source and target, it does 
> not have sufficient information to translate older offsets.
> The workarounds for this issue are to:
> 1. Pause the replication flow occasionally to allow the source to get ahead 
> of MM2
> 2. Increase the offset.lag.max to delay offset syncs, increasing the window 
> for translation to happen. With the fix for KAFKA-12468, this will also 
> increase the lag of applications that are ahead of the replication flow, so 
> this is a tradeoff.
> Instead, the MirrorCheckpointTask should provide correct and best-effort 
> translation for consumer groups behind the replication flow by keeping 
> additional state, or re-reading the offset-syncs topic. This should be a 
> substantial improvement for use-cases where applications have a higher 
> latency to commit than the replication flow, or where applications are 
> reading from the earliest offset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to