[
https://issues.apache.org/jira/browse/KAFKA-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mickael Maison reassigned KAFKA-10048:
--------------------------------------
Assignee: Andre Araujo
> Possible data gap for a consumer after a failover when using MM2
> ----------------------------------------------------------------
>
> Key: KAFKA-10048
> URL: https://issues.apache.org/jira/browse/KAFKA-10048
> Project: Kafka
> Issue Type: Bug
> Components: mirrormaker
> Affects Versions: 2.5.0
> Reporter: Andre Araujo
> Assignee: Andre Araujo
> Priority: Major
>
> I've been looking at some MM2 scenarios and identified a situation where
> consumers can miss consuming some data in the even of a failover.
>
> When a consumer subscribes to a topic for the first time and commits offsets,
> the offsets for every existing partition of that topic will be saved to the
> cluster's {{__consumer_offset}} topic. Even if a partition is completely
> empty, the offset {{0}} will still be saved for the consumer's consumer group.
>
> When MM2 is replicating the checkpoints to the remote cluster, though, it
> [ignores anything that has an offset equals to
> zero|https://github.com/apache/kafka/blob/856e36651203b03bf9a6df2f2d85a356644cbce3/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorCheckpointTask.java#L135],
> replicating offsets only for partitions that contain data.
>
> This can lead to a gap in the data consumed by consumers in the following
> scenario:
> # Topic is created on the source cluster.
> # MM2 is configured to replicate the topic and consumer groups
> # Producer starts to produce data to the source topic but for some reason
> some partitions do not get data initially, while others do (skewed keyed
> messages or bad luck)
> # Consumers start to consume data from that topic and their consumer groups'
> offsets are replicated to the target cluster, *but only for partitions that
> contain data*. The consumers are using the default setting auto.offset.reset
> = latest.
> # A consumer failover to the second cluster is performed (for whatever
> reason), and the offset translation steps are completed. The consumer are not
> restarted yet.
> # The producers continue to produce data to the source cluster topic and now
> produce data to the partitions that were empty before.
> # *After* the producers start producing data, consumers are started on the
> target cluster and start consuming.
> For the partitions that already had data before the failover, everything
> works fine. The consumer offsets will have been translated correctly and the
> consumers will start consuming from the correct position.
> For the partitions that were empty before the failover, though, any data
> written by the producers to those partitions *after the failover but before
> the consumers start* will be completely missed, since the consumers will jump
> straight to the latest offset when they start due to the lack of a zero
> offset stored locally on the target cluster.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)