[jira] [Commented] (KAFKA-19607) MirrorMaker2 Offset Replication Issue

Greg Harris (Jira) Fri, 15 Aug 2025 08:10:04 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-19607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014142#comment-18014142
 ]


Greg Harris commented on KAFKA-19607:
-------------------------------------

Hi [~geric] Please see the other related tickets in this area:
 * https://issues.apache.org/jira/browse/KAFKA-16364
 * https://issues.apache.org/jira/browse/KAFKA-16291 
 * https://issues.apache.org/jira/browse/KAFKA-15564 
 * [https://github.com/apache/kafka/pull/15423] 

For a detailed explanation of this behavior, please see this conference talk: 

[https://current.confluent.io/2024-sessions/mirrormaker-2s-offset-translation-isnt-exactly-once-and-thats-okay]
 

Farther from the end of the topic (lag is ~200) the translation is worse (lag 
can double to ~400). Because you have only 300 messages in the topic, it looks 
like the offset never gets translated, or translates to 0 or 1.

With a larger example (1000 messages produced, 800 messages consumed) I would 
expect translation to lead to a downstream consumer lag of ~400. Also be aware 
that resetting the consumer offsets may not be sufficient to clear the MM2 
state in the checkpoints topics. Make sure to use a new consumer group for 
further experiments with the same MM2 instance.

For applications, you either need to let your consumers reach 0 lag prior to 
cut-over, or you need to tolerate some re-delivery on the destination side.

Hope this helps!

> MirrorMaker2 Offset Replication Issue
> -------------------------------------
>
>                 Key: KAFKA-19607
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19607
>             Project: Kafka
>          Issue Type: Bug
>          Components: mirrormaker
>    Affects Versions: 4.0.0
>            Reporter: geric
>            Priority: Critical
>              Labels: RedHat
>
> I am using *Apache Kafka 4.0* with *MirrorMaker 2* to link the primary 
> cluster ({*}clusterA{*}) to the secondary cluster ({*}clusterB{*}).
> The secondary cluster will not have any producers or consumers until a 
> disaster recovery event occurs, at which point all producers and consumers 
> will switch to it.
> *Setup:*
>  * Dedicated standalone MirrorMaker 2 node
>  * {{IdentityReplicationPolicy}} (no topic renaming)
>  * No clients connected to secondary cluster under normal operation
> *MirrorMaker 2 config:*
>  {{# Cluster aliases
> clusters = clusterA, clusterB
> # Bootstrap servers
> clusterA.bootstrap.servers = serverA-kafka-1:9092
> clusterB.bootstrap.servers = serverB-kafka-1:9092
> # Replication policy
> replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy
> # Offset/Checkpoint sync
> emit.checkpoints.enabled=true
> emit.checkpoints.interval.seconds=5
> sync.group.offsets.enabled=true
> sync.group.offsets.interval.seconds=5
> offset.lag.max=10
> refresh.topics.interval.seconds=5}}
> ----
> h3. Test results:
>  # *Produce 300 messages when MirrorMaker is running*
> *Expected:* Topic offset synced within a minute
> *Result:* ✅ Passed
>  # *Consume 100 messages when MirrorMaker is running, then terminate the 
> consumer*
> *Expected:* Consumer offset synced
> *Result:* ❌ Failed — offset is not synced to clusterB
>  # *Restart MirrorMaker after test #2*
> *Expected:* Consumer offset synced
> *Result:* ✅ Passed
>  # *Repeat test #2 — consume 100 messages when MirrorMaker is running, then 
> terminate the consumer*
> *Expected:* Consumer offset synced
> *Result:* ❌ Failed — offset is not synced to clusterB
>  # *Restart MirrorMaker after test #4*
> *Expected:* Consumer offset synced
> *Result:* ❌ Failed — offset is not synced to clusterB
>  # *Consume messages but keep consumer running*
> *Expected:* Offset synced
> *Result:* ✅ Passed
> ----
> h3. Problem:
> Consumer offsets appear to only sync in these cases:
>  # When MirrorMaker is restarted and the consumer offset does *not* already 
> exist in the secondary cluster (initial sync), or
>  # When the consumer is still connected at the time of sync, *or* when the 
> consumer has reached the end of the offset (i.e., consumed all available 
> messages).
> However, if the consumer exits immediately after consuming some messages (but 
> {*}before reaching the end of the topic{*}), the committed offset is *never 
> synced* to the target cluster.
> ----
> h3. Additional Context / Related Issues
> This problem seems related to an open discussion in the Apache Kafka mailing 
> list:
> *MirrorCheckpointConnector does not replicate final batch of offsets*
> [https://lists.apache.org/thread/dxn9jyotl00f7ov541299cd8tlcl1z00]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-19607) MirrorMaker2 Offset Replication Issue

Reply via email to