Hi Luke, This makes sense to me. I think it's a good plan to try out. Thanks, Andrew
On 2026/02/24 12:39:51 Luke Chen wrote: > Hi Andrew, > > Thanks for the review. > > I think Fede already answered all the questions. > But about AS5, you made me think more about the possibility of supporting > the unclean leader election. (though I already did that many times :)) > > So, what we can do is: > 1. In the destination cluster leader, we mirror the batches from the source > cluster, and keep the leader epoch in the batch as is. That is, the leader > epoch in the batch can be 10, while the local leader epoch is 1. The leader > epoch cache also updates when the receiving batches from the source cluster > leader, instead of when the local cluster leadership changes. > 2. Because of (1), this destination cluster leader node can act as a > follower in the source cluster, to find out the diverging log offset when > unclean leader election happens in the source cluster, because the > "LastFetchedEpoch" in the fetch request can be set as the correct value. > 3. To avoid the unclean leader election issue like the KIP described > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-1279%3A+Cluster+Mirroring#KIP1279:ClusterMirroring-Uncleanleaderelection(LMOisnotsufficient)>, > we should do some things for it: > 3.1. When fail over to the destination cluster, we should store the [last > mirrored leader epoch] instead of the last mirrored offset. > 3.2. Force bumping the leader epoch in the destination cluster to the > leader epoch > the latest batch leader epoch. That means, any leader epoch > <= last mirrored leader epoch is already synced up with the source cluster. > 3.3. When fail back, we first query the last mirrored leader epoch from the > source cluster, then truncate based on the last mirrored leader epoch. This > is the last mirrored leader epoch matching the source cluster. That means > every record beyond the leader epoch should be truncated. > 3.4. After (3.3), all records should have leader epoch <= last mirrored > leader epoch, then we can send fetch request as usual, and let the fetch > response to handle the truncation if any. For example, the leader epoch 3 > in the destination cluster is offset10, but the leader epoch 3 in source > cluster ends at offset 8, so that means it can detect it and truncate to > offset 8 in the destination cluster. > 3.5. After (3.4), all the leader epoch and records should sync up with the > source cluster. Then we can jump back to step 1 to fetch as normal > follower, and detect the log divergence even if the source cluster has > unclean leader election. > > Does this make sense? > In theory, this might work. I need to think more about it, discuss with my > team members, and try to implement it to verify. > > Thank you, > Luke > > > On Wed, Feb 18, 2026 at 10:28 PM Federico Valeri <[email protected]> > wrote: > > > Hi Andrew, thanks for the review. > > > > Let me try to answer your questions and then other authors can join > > the discussion. > > > > AS1 > > ------ > > > > Destination topics are created with the same topic IDs using the > > extended CreateTopics API. Then, data is replicated starting from > > offset 0 with byte-for-byte batch copying, so destination offsets > > always match source offsets. When failing over, we record the last > > mirrored offset (LMO) in the destination cluster. When failing back, > > the LMO is used for truncating and then start mirroring the delta, > > otherwise we start mirroring from scratch by truncating to zero. > > > > Retention: If the mirror leader attempts to fetch an offset that is > > below the current log start offset of the source leader (e.g. fetching > > offset 50 when log start offset is 100), the source broker returns an > > OffsetOutOfRangeException that the mirror leader handles by truncating > > to the source's current log start offset and resuming fetching from > > that point. Compaction: The mirror leader replicates these compacted > > log segments exactly as they exist in the source cluster, maintaining > > the same offset assignments and gaps. > > > > Do you have any specific corner case in mind? > > > > AS2 > > ------ > > > > Agreed. The current AlterShareGroupOffsetsRequest (v0) only includes > > PartitionIndex and StartOffset with no epoch field. When mirroring > > share group offsets across clusters, the epoch is needed to ensure the > > offset alteration targets the correct leader generation. > > > > AS3 > > ------ > > > > Right, the enum is now fixed. Yes, we will parse from the right and > > apply the same naming rules used for topic name ;) > > > > AS4 > > ------- > > > > Agreed. I'll try to improve those paragraphs because they are crucial > > from an operational point of view. > > > > Let me shortly explain how it is supposed to work: > > > > 9091 (source) -----> 9094 (destination) > > > > The single operation that allows an operator to switch all topics at > > once in case of disaster is the following: > > > > bin/kafka-mirror.sh --bootstrap-server :9094 --remove --topic .* > > --mirror my-mirror > > > > 9091 (source) --x--> 9094 (destination) > > > > After that, all mirror topics become detached from the source cluster > > and start accepting writes (the two cluster are allowed to diverge). > > > > When the source cluster is back, the operator can failback by creating > > a mirror with the same name on the source cluster (new destination): > > > > echo "bootstrap.servers=localhost:9094" > /tmp/my-mirror.properties > > bin/kafka-mirrors.sh --bootstrap-server :9091 --create --mirror > > my-mirror --mirror-config /tmp/my-mirror.properties > > bin/kafka-mirrors.sh --bootstrap-server :"9091 --add --topic .* > > --mirror my-mirror > > > > 9091 (destination) <----- 9094 (source) > > > > AS5 > > ------- > > > > This is the core of our design and we reached that empirically by > > trying out different options. We didn't want to change local > > replication, and this is something you need to do when preserving the > > source leader epoch. The current design is simple and keeps the epoch > > domains entirely separate. Destination cluster is in charge of the > > leader epoch for its own log. The source epoch is only used during the > > fetch protocol to validate responses and detect divergence. > > > > The polarity idea of tracking whether an epoch bump originated from > > replication vs. local leadership change is interesting, but adds > > significant complexity and coupling between source and destination > > epochs. Could you clarify what specific scenario polarity tracking > > would address that the current separation doesn't handle? One case we > > don't support is unclean leader election reconciliation across > > clusters, is that the gap you're aiming at? > > > > I tried to rewrite the unclean leader election paragraph in the > > rejected alternatives to be easier to digest. Let me know if it works. > > > > On Tue, Feb 17, 2026 at 2:57 PM Andrew Schofield > > <[email protected]> wrote: > > > > > > Hi Fede and friends, > > > Thanks for the KIP. > > > > > > It’s a comprehensive design, easy to read and has clearly taken a lot of > > work. > > > The principle of integrating mirroring into the brokers makes total > > sense to me. > > > > > > The main comment I have is that mirroring like this cannot handle > > situations > > > in which multiple topic-partitions are logically related, such as > > transactions, > > > with total fidelity. Each topic-partition is being replicated as a > > separate entity. > > > The KIP calls this out and describes the behaviour thoroughly. > > > > > > A few initial comments. > > > > > > AS1) Is it true that offsets are always preserved by this KIP? I *think* > > so but > > > not totally sure that it’s true in all cases. It would certainly be nice. > > > > > > AS2) I think you need to add epoch information to > > AlterShareGroupOffsetsRequest. > > > It really should already be there in hindsight, but I think this KIP > > requires it. > > > > > > AS3) The CoordinatorType enum for MIRROR will need to be 3 because 2 is > > SHARE. > > > I’m sure you’ll parse the keys from the right ;) > > > > > > AS4) The procedure for achieving a failover could be clearer. Let’s say > > that I am > > > using cluster mirroring to achieve DR replication. My source cluster is > > utterly lost > > > due to a disaster. What’s the single operation that I perform to switch > > all of the > > > topics mirrored from the lost source cluster to become the active topics? > > > Similarly for failback. > > > > > > AS5) The only piece that I’m really unsure of is the epoch management. I > > would > > > have thought that the cluster which currently has the writable > > topic-partition > > > would be in charge of the leader epoch and it would not be necessary to > > > perform all of the gymnastics described in the section on epoch > > rewriting. > > > I have read the Rejected Alternatives section too, but I don’t fully > > grasp > > > why it was necessary to reject it. > > > > > > I wonder if we could store the “polarity” of an epoch, essentially > > whether the > > > epoch bump was observed by replication from a source cluster, or whether > > > it was bumped by a local leadership change when the topic is locally > > writable. > > > When a topic-partition switches from read-only to writable, we should > > definitely > > > bump the epoch, and we could record the fact that it was a local epoch. > > > When connectivity is re-established, you might find that both ends have > > > declared a local epoch N, but someone has to win. > > > > > > Thanks, > > > Andrew > > > > > > > On 14 Feb 2026, at 07:17, Federico Valeri <[email protected]> > > wrote: > > > > > > > > Hi, we would like to start a discussion thread about KIP-1279: Cluster > > > > Mirroring. > > > > > > > > Cluster Mirroring is a new Kafka feature that enables native, > > > > broker-level topic replication across clusters. Unlike MirrorMaker 2 > > > > (which runs as an external Connect-based tool), Cluster Mirroring is > > > > built into the broker itself, allowing tighter integration with the > > > > controller, coordinator, and partition lifecycle. > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1279%3A+Cluster+Mirroring > > > > > > > > There are a few missing bits, but most of the design is there, so we > > > > think it is the right time to involve the community and get feedback. > > > > Please help validating our approach. > > > > > > > > Thanks > > > > Fede > > > > > >
