[ https://issues.apache.org/jira/browse/KAFKA-19607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014142#comment-18014142 ]
Greg Harris commented on KAFKA-19607: ------------------------------------- Hi [~geric] Please see the other related tickets in this area: * https://issues.apache.org/jira/browse/KAFKA-16364 * https://issues.apache.org/jira/browse/KAFKA-16291 * https://issues.apache.org/jira/browse/KAFKA-15564 * [https://github.com/apache/kafka/pull/15423] For a detailed explanation of this behavior, please see this conference talk: [https://current.confluent.io/2024-sessions/mirrormaker-2s-offset-translation-isnt-exactly-once-and-thats-okay] Farther from the end of the topic (lag is ~200) the translation is worse (lag can double to ~400). Because you have only 300 messages in the topic, it looks like the offset never gets translated, or translates to 0 or 1. With a larger example (1000 messages produced, 800 messages consumed) I would expect translation to lead to a downstream consumer lag of ~400. Also be aware that resetting the consumer offsets may not be sufficient to clear the MM2 state in the checkpoints topics. Make sure to use a new consumer group for further experiments with the same MM2 instance. For applications, you either need to let your consumers reach 0 lag prior to cut-over, or you need to tolerate some re-delivery on the destination side. Hope this helps! > MirrorMaker2 Offset Replication Issue > ------------------------------------- > > Key: KAFKA-19607 > URL: https://issues.apache.org/jira/browse/KAFKA-19607 > Project: Kafka > Issue Type: Bug > Components: mirrormaker > Affects Versions: 4.0.0 > Reporter: geric > Priority: Critical > Labels: RedHat > > I am using *Apache Kafka 4.0* with *MirrorMaker 2* to link the primary > cluster ({*}clusterA{*}) to the secondary cluster ({*}clusterB{*}). > The secondary cluster will not have any producers or consumers until a > disaster recovery event occurs, at which point all producers and consumers > will switch to it. > *Setup:* > * Dedicated standalone MirrorMaker 2 node > * {{IdentityReplicationPolicy}} (no topic renaming) > * No clients connected to secondary cluster under normal operation > *MirrorMaker 2 config:* > {{# Cluster aliases > clusters = clusterA, clusterB > # Bootstrap servers > clusterA.bootstrap.servers = serverA-kafka-1:9092 > clusterB.bootstrap.servers = serverB-kafka-1:9092 > # Replication policy > replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy > # Offset/Checkpoint sync > emit.checkpoints.enabled=true > emit.checkpoints.interval.seconds=5 > sync.group.offsets.enabled=true > sync.group.offsets.interval.seconds=5 > offset.lag.max=10 > refresh.topics.interval.seconds=5}} > ---- > h3. Test results: > # *Produce 300 messages when MirrorMaker is running* > *Expected:* Topic offset synced within a minute > *Result:* ✅ Passed > # *Consume 100 messages when MirrorMaker is running, then terminate the > consumer* > *Expected:* Consumer offset synced > *Result:* ❌ Failed — offset is not synced to clusterB > # *Restart MirrorMaker after test #2* > *Expected:* Consumer offset synced > *Result:* ✅ Passed > # *Repeat test #2 — consume 100 messages when MirrorMaker is running, then > terminate the consumer* > *Expected:* Consumer offset synced > *Result:* ❌ Failed — offset is not synced to clusterB > # *Restart MirrorMaker after test #4* > *Expected:* Consumer offset synced > *Result:* ❌ Failed — offset is not synced to clusterB > # *Consume messages but keep consumer running* > *Expected:* Offset synced > *Result:* ✅ Passed > ---- > h3. Problem: > Consumer offsets appear to only sync in these cases: > # When MirrorMaker is restarted and the consumer offset does *not* already > exist in the secondary cluster (initial sync), or > # When the consumer is still connected at the time of sync, *or* when the > consumer has reached the end of the offset (i.e., consumed all available > messages). > However, if the consumer exits immediately after consuming some messages (but > {*}before reaching the end of the topic{*}), the committed offset is *never > synced* to the target cluster. > ---- > h3. Additional Context / Related Issues > This problem seems related to an open discussion in the Apache Kafka mailing > list: > *MirrorCheckpointConnector does not replicate final batch of offsets* > [https://lists.apache.org/thread/dxn9jyotl00f7ov541299cd8tlcl1z00] -- This message was sent by Atlassian Jira (v8.20.10#820010)