Hi Apache Kafka Community, I'm working on a problem involving cross-cluster data flow with exactly-once guarantees and would love to get your thoughts or experiences on this. The Setup:
- I have a set of *consumers reading from a source topic* located on Kafka cluster *K1*. - There's a *producer writing to a target topic* on a *different Kafka cluster*, let's call it *K2*. - My goal is to *only commit offsets on K1* *after* a successful write to the topic on *K2*. The Challenge: If the producer gets stuck in a transaction (e.g., due to a GC pause), and the consumer is removed from the group (e.g., no heartbeats during the session timeout), then the *transaction coordinator on K2 has no context to determine the validity of the transaction*. If both the source and target topics were on the *same cluster*, I could have used sendOffsetsToTransaction() with ConsumerGroupMetadata. This allows the broker to gate the commit based on the consumer group's generation ID. But since the clusters are different, this isn’t possible. Workarounds Considered: 1. *Static partition-to-consumer assignment*: - Assign each consumer to a fixed set of partitions to avoid partition migration and generation ID mismatch. - Downside: Doesn't scale well and is brittle in dynamic environments. 2. *One producer per partition with deterministic transactional.id <http://transactional.id>* (e.g., transactional.id = topic-partition): - If a partition is reassigned, a new producer with the same transactional.id starts, causing fencing of the old one. - Downside: The number of producers scales linearly with partitions, which isn’t ideal in large-scale systems. The Ask: Both approaches are limiting in terms of *scalability and resilience*. Has anyone faced this issue or found an elegant solution for achieving *transactional guarantees across Kafka clusters*? Would love to hear if: - There are *patterns or tools* in the ecosystem that help with this. - Anyone has explored *outbox patterns*, *exactly-once delivery bridges*, or *custom fencing mechanisms* to solve similar problems. - There's *any roadmap* for Kafka itself to support cross-cluster sendOffsetsToTransaction or a similar API. Any insights, suggestions, or shared pain would be incredibly helpful! Thanks in advance, Pritam Kumar