Re: [DISCUSS] KIP-1279: Cluster Mirroring

Andrew Schofield Fri, 06 Mar 2026 09:53:51 -0800

Hi Luke,
This makes sense to me. I think it's a good plan to try out.

Thanks,
Andrew


On 2026/02/24 12:39:51 Luke Chen wrote:
> Hi Andrew,
> 
> Thanks for the review.
> 
> I think Fede already answered all the questions.
> But about AS5, you made me think more about the possibility of supporting
> the unclean leader election. (though I already did that many times :))
> 
> So, what we can do is:
> 1. In the destination cluster leader, we mirror the batches from the source
> cluster, and keep the leader epoch in the batch as is. That is, the leader
> epoch in the batch can be 10, while the local leader epoch is 1. The leader
> epoch cache also updates when the receiving batches from the source cluster
> leader, instead of when the local cluster leadership changes.
> 2. Because of (1), this destination cluster leader node can act as a
> follower in the source cluster, to find out the diverging log offset when
> unclean leader election happens in the source cluster, because the
> "LastFetchedEpoch" in the fetch request can be set as the correct value.
> 3. To avoid the unclean leader election issue like the KIP described
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-1279%3A+Cluster+Mirroring#KIP1279:ClusterMirroring-Uncleanleaderelection(LMOisnotsufficient)>,
> we should do some things for it:
> 3.1. When fail over to the destination cluster, we should store the [last
> mirrored leader epoch] instead of the last mirrored offset.
> 3.2. Force bumping the leader epoch in the destination cluster to the
> leader epoch > the latest batch leader epoch. That means, any leader epoch
> <= last mirrored leader epoch is already synced up with the source cluster.
> 3.3. When fail back, we first query the last mirrored leader epoch from the
> source cluster, then truncate based on the last mirrored leader epoch. This
> is the last mirrored leader epoch matching the source cluster. That means
> every record beyond the leader epoch should be truncated.
> 3.4. After (3.3), all records should have leader epoch <= last mirrored
> leader epoch, then we can send fetch request as usual, and let the fetch
> response to handle the truncation if any. For example, the leader epoch 3
> in the destination cluster is offset10, but the leader epoch 3 in source
> cluster ends at offset 8, so that means it can detect it and truncate to
> offset 8 in the destination cluster.
> 3.5. After (3.4), all the leader epoch and records should sync up with the
> source cluster. Then we can jump back to step 1 to fetch as normal
> follower, and detect the log divergence even if the source cluster has
> unclean leader election.
> 
> Does this make sense?
> In theory, this might work. I need to think more about it, discuss with my
> team members, and try to implement it to verify.
> 
> Thank you,
> Luke
> 
> 
> On Wed, Feb 18, 2026 at 10:28 PM Federico Valeri <[email protected]>
> wrote:
> 
> > Hi Andrew, thanks for the review.
> >
> > Let me try to answer your questions and then other authors can join
> > the discussion.
> >
> > AS1
> > ------
> >
> > Destination topics are created with the same topic IDs using the
> > extended CreateTopics API. Then, data is replicated starting from
> > offset 0 with byte-for-byte batch copying, so destination offsets
> > always match source offsets. When failing over, we record the last
> > mirrored offset (LMO) in the destination cluster. When failing back,
> > the LMO is used for truncating and then start mirroring the delta,
> > otherwise we start mirroring from scratch by truncating to zero.
> >
> > Retention: If the mirror leader attempts to fetch an offset that is
> > below the current log start offset of the source leader (e.g. fetching
> > offset 50 when log start offset is 100), the source broker returns an
> > OffsetOutOfRangeException that the mirror leader handles by truncating
> > to the source's current log start offset and resuming fetching from
> > that point. Compaction: The mirror leader replicates these compacted
> > log segments exactly as they exist in the source cluster, maintaining
> > the same offset assignments and gaps.
> >
> > Do you have any specific corner case in mind?
> >
> > AS2
> > ------
> >
> > Agreed. The current AlterShareGroupOffsetsRequest (v0) only includes
> > PartitionIndex and StartOffset with no epoch field. When mirroring
> > share group offsets across clusters, the epoch is needed to ensure the
> > offset alteration targets the correct leader generation.
> >
> > AS3
> > ------
> >
> > Right, the enum is now fixed. Yes, we will parse from the right and
> > apply the same naming rules used for topic name ;)
> >
> > AS4
> > -------
> >
> > Agreed. I'll try to improve those paragraphs because they are crucial
> > from an operational point of view.
> >
> > Let me shortly explain how it is supposed to work:
> >
> > 9091 (source) -----> 9094 (destination)
> >
> > The single operation that allows an operator to switch all topics at
> > once in case of disaster is the following:
> >
> > bin/kafka-mirror.sh --bootstrap-server :9094 --remove --topic .*
> > --mirror my-mirror
> >
> > 9091 (source) --x--> 9094 (destination)
> >
> > After that, all mirror topics become detached from the source cluster
> > and start accepting writes (the two cluster are allowed to diverge).
> >
> > When the source cluster is back, the operator can failback by creating
> > a mirror with the same name on the source cluster (new destination):
> >
> > echo "bootstrap.servers=localhost:9094" > /tmp/my-mirror.properties
> > bin/kafka-mirrors.sh --bootstrap-server :9091 --create --mirror
> > my-mirror --mirror-config /tmp/my-mirror.properties
> > bin/kafka-mirrors.sh --bootstrap-server :"9091 --add --topic .*
> > --mirror my-mirror
> >
> > 9091 (destination) <----- 9094 (source)
> >
> > AS5
> > -------
> >
> > This is the core of our design and we reached that empirically by
> > trying out different options. We didn't want to change local
> > replication, and this is something you need to do when preserving the
> > source leader epoch. The current design is simple and keeps the epoch
> > domains entirely separate. Destination cluster is in charge of the
> > leader epoch for its own log. The source epoch is only used during the
> > fetch protocol to validate responses and detect divergence.
> >
> > The polarity idea of tracking whether an epoch bump originated from
> > replication vs. local leadership change is interesting, but adds
> > significant complexity and coupling between source and destination
> > epochs. Could you clarify what specific scenario polarity tracking
> > would address that the current separation doesn't handle? One case we
> > don't support is unclean leader election reconciliation across
> > clusters, is that the gap you're aiming at?
> >
> > I tried to rewrite the unclean leader election paragraph in the
> > rejected alternatives to be easier to digest. Let me know if it works.
> >
> > On Tue, Feb 17, 2026 at 2:57 PM Andrew Schofield
> > <[email protected]> wrote:
> > >
> > > Hi Fede and friends,
> > > Thanks for the KIP.
> > >
> > > It’s a comprehensive design, easy to read and has clearly taken a lot of
> > work.
> > > The principle of integrating mirroring into the brokers makes total
> > sense to me.
> > >
> > > The main comment I have is that mirroring like this cannot handle
> > situations
> > > in which multiple topic-partitions are logically related, such as
> > transactions,
> > > with total fidelity. Each topic-partition is being replicated as a
> > separate entity.
> > > The KIP calls this out and describes the behaviour thoroughly.
> > >
> > > A few initial comments.
> > >
> > > AS1) Is it true that offsets are always preserved by this KIP? I *think*
> > so but
> > > not totally sure that it’s true in all cases. It would certainly be nice.
> > >
> > > AS2) I think you need to add epoch information to
> > AlterShareGroupOffsetsRequest.
> > > It really should already be there in hindsight, but I think this KIP
> > requires it.
> > >
> > > AS3) The CoordinatorType enum for MIRROR will need to be 3 because 2 is
> > SHARE.
> > > I’m sure you’ll parse the keys from the right ;)
> > >
> > > AS4) The procedure for achieving a failover could be clearer. Let’s say
> > that I am
> > > using cluster mirroring to achieve DR replication. My source cluster is
> > utterly lost
> > > due to a disaster. What’s the single operation that I perform to switch
> > all of the
> > > topics mirrored from the lost source cluster to become the active topics?
> > > Similarly for failback.
> > >
> > > AS5) The only piece that I’m really unsure of is the epoch management. I
> > would
> > > have thought that the cluster which currently has the writable
> > topic-partition
> > > would be in charge of the leader epoch and it would not be necessary to
> > > perform all of the gymnastics described in the section on epoch
> > rewriting.
> > > I have read the Rejected Alternatives section too, but I don’t fully
> > grasp
> > > why it was necessary to reject it.
> > >
> > > I wonder if we could store the “polarity” of an epoch, essentially
> > whether the
> > > epoch bump was observed by replication from a source cluster, or whether
> > > it was bumped by a local leadership change when the topic is locally
> > writable.
> > > When a topic-partition switches from read-only to writable, we should
> > definitely
> > > bump the epoch, and we could record the fact that it was a local epoch.
> > > When connectivity is re-established, you might find that both ends have
> > > declared a local epoch N, but someone has to win.
> > >
> > > Thanks,
> > > Andrew
> > >
> > > > On 14 Feb 2026, at 07:17, Federico Valeri <[email protected]>
> > wrote:
> > > >
> > > > Hi, we would like to start a discussion thread about KIP-1279: Cluster
> > > > Mirroring.
> > > >
> > > > Cluster Mirroring is a new Kafka feature that enables native,
> > > > broker-level topic replication across clusters. Unlike MirrorMaker 2
> > > > (which runs as an external Connect-based tool), Cluster Mirroring is
> > > > built into the broker itself, allowing tighter integration with the
> > > > controller, coordinator, and partition lifecycle.
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1279%3A+Cluster+Mirroring
> > > >
> > > > There are a few missing bits, but most of the design is there, so we
> > > > think it is the right time to involve the community and get feedback.
> > > > Please help validating our approach.
> > > >
> > > > Thanks
> > > > Fede
> > >
> >
>

Re: [DISCUSS] KIP-1279: Cluster Mirroring

Reply via email to