Fabien LD created KAFKA-6915:

             Summary: MirrorMaker: avoid duplicates when source cluster is 
unreachable for more than session.timeout.ms
                 Key: KAFKA-6915
                 URL: https://issues.apache.org/jira/browse/KAFKA-6915
             Project: Kafka
          Issue Type: Improvement
    Affects Versions: 1.1.0
            Reporter: Fabien LD

According to doc, see 
[https://kafka.apache.org/11/documentation.html#semantics], the exactly-once 
delivery can be achieved by storing offsets in the same store as produced data:
When writing to an external system, the limitation is in the need to coordinate 
the consumer's position with what is actually stored as output. The classic way 
of achieving this would be to introduce a two-phase commit between the storage 
of the consumer position and the storage of the consumers output. But this can 
be handled more simply and generally by letting the consumer store its offset 
in the same place as its output

Indeed, with current implementation where the consumer stores the offsets in 
the source cluster, we can have duplicates if networks makes source cluster 
unreachable for more than {{session.timeout.ms}}.
Indeed, once that amount of time has passed, the source cluster will rebalance 
the consumer group and later, when network is back, the generation has changed 
and consumers cannot commit the offsets for the last batches of records 
consumed (actually all records processed during the last 
{{auto.commit.interval.ms}}). So all those records are processed again when 
consumers from group are coming back.

Storing the offsets in the target cluster would resolve this risk of duplicate 
records and would be a nice feature to have.

This message was sent by Atlassian JIRA

Reply via email to