[ 
https://issues.apache.org/jira/browse/KAFKA-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison reassigned KAFKA-10048:
--------------------------------------

    Assignee: Andre Araujo

> Possible data gap for a consumer after a failover when using MM2
> ----------------------------------------------------------------
>
>                 Key: KAFKA-10048
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10048
>             Project: Kafka
>          Issue Type: Bug
>          Components: mirrormaker
>    Affects Versions: 2.5.0
>            Reporter: Andre Araujo
>            Assignee: Andre Araujo
>            Priority: Major
>
> I've been looking at some MM2 scenarios and identified a situation where 
> consumers can miss consuming some data in the even of a failover.
>  
> When a consumer subscribes to a topic for the first time and commits offsets, 
> the offsets for every existing partition of that topic will be saved to the 
> cluster's {{__consumer_offset}} topic. Even if a partition is completely 
> empty, the offset {{0}} will still be saved for the consumer's consumer group.
>  
> When MM2 is replicating the checkpoints to the remote cluster, though, it 
> [ignores anything that has an offset equals to 
> zero|https://github.com/apache/kafka/blob/856e36651203b03bf9a6df2f2d85a356644cbce3/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorCheckpointTask.java#L135],
>  replicating offsets only for partitions that contain data.
>  
> This can lead to a gap in the data consumed by consumers in the following 
> scenario:
>  # Topic is created on the source cluster.
>  # MM2 is configured to replicate the topic and consumer groups
>  # Producer starts to produce data to the source topic but for some reason 
> some partitions do not get data initially, while others do (skewed keyed 
> messages or bad luck)
>  # Consumers start to consume data from that topic and their consumer groups' 
> offsets are replicated to the target cluster, *but only for partitions that 
> contain data*. The consumers are using the default setting auto.offset.reset 
> = latest.
>  # A consumer failover to the second cluster is performed (for whatever 
> reason), and the offset translation steps are completed. The consumer are not 
> restarted yet.
>  # The producers continue to produce data to the source cluster topic and now 
> produce data to the partitions that were empty before.
>  # *After* the producers start producing data, consumers are started on the 
> target cluster and start consuming.
> For the partitions that already had data before the failover, everything 
> works fine. The consumer offsets will have been translated correctly and the 
> consumers will start consuming from the correct position.
> For the partitions that were empty before the failover, though, any data 
> written by the producers to those partitions *after the failover but before 
> the consumers start* will be completely missed, since the consumers will jump 
> straight to the latest offset when they start due to the lack of a zero 
> offset stored locally on the target cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to