[ 
https://issues.apache.org/jira/browse/KAFKA-19607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014666#comment-18014666
 ] 

Greg Harris commented on KAFKA-19607:
-------------------------------------

[~geric] I'm not sure exactly what you mean by force-sync. I mentioned in my 
last message "you either need to let your consumers reach 0 lag prior to 
cut-over, or ...".

To be more explicit, If you're planning to flip to the target cluster and want 
minimum redelivery you should:
1. Stop the source producers
2. Wait for source consumers to reach 0 lag and commit offsets
3. Wait for MM2 to translate the offsets to get 0 lag on the target
4. Alter target ACLs and start target producers
5. Start the target consumers
6. Stop MM2 mirroring

Step 3 is the synchronization point, because if you perform step (4) early, 
some of the newly produced data will be dropped (data loss), and if you perform 
step (5) early, MM2 won't be able to sync offsets into an active group (extra 
redelivery).

> MirrorMaker2 Offset Replication Issue
> -------------------------------------
>
>                 Key: KAFKA-19607
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19607
>             Project: Kafka
>          Issue Type: Bug
>          Components: mirrormaker
>    Affects Versions: 4.0.0
>            Reporter: geric
>            Priority: Minor
>              Labels: RedHat
>
> I am using *Apache Kafka 4.0* with *MirrorMaker 2* to link the primary 
> cluster ({*}clusterA{*}) to the secondary cluster ({*}clusterB{*}).
> The secondary cluster will not have any producers or consumers until a 
> disaster recovery event occurs, at which point all producers and consumers 
> will switch to it.
> *Setup:*
>  * Dedicated standalone MirrorMaker 2 node
>  * {{IdentityReplicationPolicy}} (no topic renaming)
>  * No clients connected to secondary cluster under normal operation
> *MirrorMaker 2 config:*
>  {{# Cluster aliases
> clusters = clusterA, clusterB
> # Bootstrap servers
> clusterA.bootstrap.servers = serverA-kafka-1:9092
> clusterB.bootstrap.servers = serverB-kafka-1:9092
> # Replication policy
> replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy
> # Offset/Checkpoint sync
> emit.checkpoints.enabled=true
> emit.checkpoints.interval.seconds=5
> sync.group.offsets.enabled=true
> sync.group.offsets.interval.seconds=5
> offset.lag.max=10
> refresh.topics.interval.seconds=5}}
> ----
> h3. Test results:
>  # *Produce 300 messages when MirrorMaker is running*
> *Expected:* Topic offset synced within a minute
> *Result:* ✅ Passed
>  # *Consume 100 messages when MirrorMaker is running, then terminate the 
> consumer*
> *Expected:* Consumer offset synced
> *Result:* ❌ Failed — offset is not synced to clusterB
>  # *Restart MirrorMaker after test #2*
> *Expected:* Consumer offset synced
> *Result:* ✅ Passed
>  # *Repeat test #2 — consume 100 messages when MirrorMaker is running, then 
> terminate the consumer*
> *Expected:* Consumer offset synced
> *Result:* ❌ Failed — offset is not synced to clusterB
>  # *Restart MirrorMaker after test #4*
> *Expected:* Consumer offset synced
> *Result:* ❌ Failed — offset is not synced to clusterB
>  # *Consume messages but keep consumer running*
> *Expected:* Offset synced
> *Result:* ✅ Passed
> ----
> h3. Problem:
> Consumer offsets appear to only sync in these cases:
>  # When MirrorMaker is restarted and the consumer offset does *not* already 
> exist in the secondary cluster (initial sync), or
>  # When the consumer is still connected at the time of sync, *or* when the 
> consumer has reached the end of the offset (i.e., consumed all available 
> messages).
> However, if the consumer exits immediately after consuming some messages (but 
> {*}before reaching the end of the topic{*}), the committed offset is *never 
> synced* to the target cluster.
> ----
> h3. Additional Context / Related Issues
> This problem seems related to an open discussion in the Apache Kafka mailing 
> list:
> *MirrorCheckpointConnector does not replicate final batch of offsets*
> [https://lists.apache.org/thread/dxn9jyotl00f7ov541299cd8tlcl1z00]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to