Re: [DISCUSS] KIP-1279: Cluster Mirroring

Federico Valeri Wed, 18 Feb 2026 05:23:13 -0800

Hi Viktor, thank you for taking the time to review our KIP.

svv1
-------

We are aware of KIP-986 and considered it during the design phase.
Both KIPs share the same core motivations of replacing MirrorMaker 2
with broker-native replication and preserving exact offsets across
clusters. However, they differ in scope and approach. KIP-1279 is
deliberately more targeted and pragmatic. We only support
unidirectional and asynchronous cross-cluster replication, but in a
way that can be extended to support synchronous cross-cluster
replication. Instead of introducing the namespacing complexity, we use
a simpler topic-level mirror.name configuration.

One of the driving factor of this proposal is to keep internals mostly
untouched and leverage existing components. The modifications are
largely additive: new optional parameters with defaults, new request
handlers, and a parallel fetcher manager. The most significant
behavioral change is the read-only enforcement in ReplicaManager for
mirror partitions.

svv2
------

This is an interesting point which I think it is mostly relevant for
for bidirectional setups, not DR and migration setups.

In a way, KIP-1279's architecture already supports this deployment
topology without protocol changes. By default, mirror topic partitions
are spread across all 9 brokers. Each broker handles both normal
(inter-cluster) and mirror (cross-cluster) traffic. You can achieve
dedicated mirroring nodes through partition reassignment, moving all
mirror topic partitions to selected brokers. After this, all the other
brokers never touch the WAN and client traffic is not impacted. I
think that there is value in providing such deployment mode, but it
can be a follow up KIP.

Let me also add that mirror failures are isolated at the partition
level. A fetch error or malformed batch from the source transitions
only the affected partition to FAILED, other mirror partitions and all
normal partitions on the same broker continue operating normally. This
is fundamentally different from MM2, where a single bad record can
stall an entire Connect task affecting multiple partitions.

svv3
--------

The concern is valid. We should outline why we believe the current
architecture is sound and won't require major refactoring when tiered
storage support is added.

Luke is working on this and can explain better than me but, in the
meantime, I'll try to summarize my understanding. When the mirror
fetcher encounters OffsetMovedToTieredStorageException from the source
cluster, it fetches remote log segment metadata via a new
ListRemoteLogMetadata API on the source and writes the metadata into
the destination's RemoteLogMetadataManager. This allows the
destination to reference the same remote storage segments as the
source without re-uploading them. Since mirror topics are read-only,
no log segments are uploaded to remote storage during active
mirroring. Uploading resumes only after the topic becomes writable
(failover). Remote log metadata is also synced periodically alongside
topic configs and ACLs. During failback, overlapping segments (same
offset range, different segment IDs) are resolved by honoring the new
source's metadata and marking the stale entries as duplicated, which
are then skipped during reads.

The key point is that none of this requires restructuring the
coordinator protocol, the state machine, or the __mirror_state topic
schema. It's an extension to the data fetch layer. We will add a
section to the KIP outlining this high-level approach to provide more
confidence that the architecture is forward-compatible.

svv4
------

We understand the concern about terminology overlap with MirrorMaker
2. However, we chose "Cluster Mirroring" intentionally because it is
short, describes the feature's purpose (mirroring data across
clusters) and signals continuity with the use cases that MM2 serves
today. Regarding alternatives: "Cluster Linking" has prior art and
trademark associations. "Cross-Cluster Replication" is technically
correct, but it could create confusion with KIP-986. "Replicator
nodes" describes a deployment topology rather than the feature itself.

On Tue, Feb 17, 2026 at 11:32 AM Viktor Somogyi-Vass <[email protected]> wrote:
>
> Hi Federico,
>
> I have only a few high level questions.
>
> svv1. There is some previous work in KIP-986:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-986%3A+Cross-Cluster+Replication.
> As I see it, your proposal seems more targeted than that one. I'm curious
> whether you considered that during the writing? It might be useful to
> compare them as there are some differences, although the 2 KIPs try to
> achieve the same.
>
> svv2. One of advantages of MM2 is that it separates replication, which
> therefore won't put extra load on the brokers and also it limits the blast
> radius of any failures. The KIP separates thread pools and resources inside
> the brokers, but it essentially says that replicators should scale
> vertically to handle the extra replication traffic. This I think creates
> brokers where failures have more impact. My suggestion would be to consider
> replicator only nodes. Nodes where only cross-cluster replication and
> client traffic happens make it much easier to plan and scale a cluster.
> Also it would possibly keep the internals of the brokers mostly untouched
> and we would also segregate replication traffic from normal brokers. We
> should be able to keep the benefits of the KIP (exact offset mapping) but
> also gain some of the improvements brought by MM2 (separated failure
> mechanism).
>
> svv3. As you write, tiered storage and diskless topics are out of the
> scope. While I agree with the latter as the KIP is currently not yet
> implemented, I miss the tiered storage parts. I think we should benefit at
> least from a high level plan to see that your proposal is sound and that we
> wouldn't need any major refactors when designing the tiered storage
> cross-cluster replication.
>
> svv4. Lastly, I think we shouldn't call KIP-1279 "cluster mirroring" as it
> is very confusing with the current mirror maker terminology. Let's not
> overload that. "Cluster-linking" or "replicator nodes" may sound better.
> What do you think?
>
> Best,
> Viktor
>
> On Sat, Feb 14, 2026 at 9:38 PM vaquar khan <[email protected]> wrote:
>
> > Hi Fede,
> >
> > I reviewed the KIP-1279 proposal yesterday and corrected the KIP number. I
> > now have time to share my very detailed observations. While I fully support
> > the goal of removing the operational complexity of Kafka , the design
> > appears to trade that complexity for broker stability.
> >
> > By moving WAN replication into the broker’s core runtime, we are
> > effectively removing the failure domain isolation that MirrorMaker 2
> > provides. We risk coupling the stability of our production clusters to the
> > instability of cross-datacenter networks.Before this KIP moves to a vote, I
> > strongly recommend you and other authors to address the following stability
> > gaps. Without concrete answers here, the risk profile is likely too high
> > for mission-critical deployments.
> >
> > 1. The Thundering Herd and Memory Isolation Risk
> > In the current architecture, MirrorMaker 2 (MM2) Connect workers provide a
> > physical failure domain through a separate JVM heap. This isolates the
> > broker from the memory pressure and Garbage Collection (GC) impact caused
> > by replication surges. In this proposal, that pressure hits the broker’s
> > core runtime directly.
> >
> > The Gap: We need simulation data for a sustained link outage (e.g., 6 hours
> > on 10Gbps). When 5,000 partitions resume fetching, does the resulting
> > backfill I/O and heap pressure cause GC pauses that push P99 Produce
> > latency on the target cluster over 10ms? We must ensure that a massive
> > catch-up phase does not starve the broker's Request Handler threads or
> > destabilize the JVM.
> >
> >
> > 2. Blast Radius (Poison Pill  Problem)
> > The Gap: If a source broker sends a malformed batch (e.g., bit rot), does
> > it crash the entire broker process? In MM2, this kills a single task. We
> > need confirmation that exceptions are isolated to the replication thread
> > pool and will not trigger a node-wide panic.
> >
> > 3. Control Plane Saturation
> > The Gap: How does the system handle a "link flap" event where 50,000
> > partitions transition states rapidly? We need to verify that the resulting
> > flood of metadata updates will not block the Controller from processing
> > critical ISR changes for local topics.
> >
> > 4. Transactional Integrity
> > "Byte-for-byte" replication copies transaction markers but not the
> > Coordinator’s state (PIDs).
> > The Gap: How does the destination broker validate an aborted transaction
> > without the source PID? We should avoid creating "zombie" transactions that
> > look valid but cannot be authoritatively managed.
> >
> > 5. Infinite Loop Prevention
> > Since byte-for-byte precludes injecting lineage headers e.g., dc-source, we
> > lose the standard mechanism for detecting loops in mesh topologies (A→B→A).
> > The Gap: Relying solely on topic naming conventions is operationally
> > fragile. What is the deterministic mechanism to prevent infinite recursion?
> >
> > 6. Data Divergence and Epoch Reconciliation
> > The current proposal explicitly excludes support for unclean leader
> > election because there is no mechanism for a "shared leader epoch" between
> > clusters.
> > The Gap: Without epoch reconciliation, if the source cluster experiences an
> > unclean election, the source and destination logs will diverge. If an
> > operator later attempts a failback (reverse mirroring), the clusters will
> > contain inconsistent data for the same offset, leading to potential silent
> > data corruption or permanent replication failure.
> >
> > 7. Tiered Storage Operational Gaps
> > The design states that Tiered Storage is not initially supported and that a
> > mirror follower encountering an OffsetMovedToTieredStorageException will
> > simply mark the partition as FAILED.
> > The Gap: For mission-critical clusters using Tiered Storage for long-term
> > retention, this creates an operational cliff. Mirroring will fail as soon
> > as the source cluster offloads data to remote storage. We need a roadmap
> > for how native mirroring will eventually interact with tiered segments
> > without failing the partition.
> >
> > 8. Transactional State and PID Mapping
> > While the KIP proposes a deterministic formula for rewriting Producer IDs
> > ,calculated as destinationProducerId= (sourceProducerId+2) it does not
> > replicate the transaction_state metadata.
> > The Gap: How does the destination broker authoritatively validate or expire
> > hanging transactions if the source PID state is rewritten but the
> > transaction coordinator state is missing?
> > We risk a scenario where consumers encounter zombie transactions that can
> > never be decided on the destination cluster.
> >
> > This is a big change to how our system is built. We need to make sure it
> > doesn't create a weak link that could bring the whole system down,We should
> > ensure it does not introduce a new single point of failure.
> >
> > Regards,
> > Viquar Khan
> > *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
> > *Book *-
> >
> > https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
> >
> > *GitBook*-https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
> > <https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true*GitBook*-https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/>
> > *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
> > *github*-https://github.com/vaquarkhan
> >
> > On Sat, 14 Feb 2026 at 01:18, Federico Valeri <[email protected]>
> > wrote:
> >
> > > Hi, we would like to start a discussion thread about KIP-1279: Cluster
> > > Mirroring.
> > >
> > > Cluster Mirroring is a new Kafka feature that enables native,
> > > broker-level topic replication across clusters. Unlike MirrorMaker 2
> > > (which runs as an external Connect-based tool), Cluster Mirroring is
> > > built into the broker itself, allowing tighter integration with the
> > > controller, coordinator, and partition lifecycle.
> > >
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1279%3A+Cluster+Mirroring
> > >
> > > There are a few missing bits, but most of the design is there, so we
> > > think it is the right time to involve the community and get feedback.
> > > Please help validating our approach.
> > >
> > > Thanks
> > > Fede
> > >
> >

Re: [DISCUSS] KIP-1279: Cluster Mirroring

Reply via email to