Hi Fede, Thanks for your response. AS1: Thanks for the clarification.
AS2: I expect you'll include a version bump of AlterShareGroupOffsets in this KIP, but that's a small matter compared with the rest of the protocol changes. AS3: OK. AS4: Thanks for the details. My only comment is that it might be a bit laborious when you want to failover all topics. I suggest adding `--all-topics` so you could do: $ bin/kafka-mirror.sh --bootstrap-server :9094 --remove --all-topics --mirror my-mirror AS5: Thanks for the response. I understand there are good reasons for the way epochs are handled in the KIP. I see that there is a sub-document for the KIP about unclean leader election. I'll spend some time reviewing that. Thanks, Andrew On 2026/02/18 13:27:07 Federico Valeri wrote: > Hi Andrew, thanks for the review. > > Let me try to answer your questions and then other authors can join > the discussion. > > AS1 > ------ > > Destination topics are created with the same topic IDs using the > extended CreateTopics API. Then, data is replicated starting from > offset 0 with byte-for-byte batch copying, so destination offsets > always match source offsets. When failing over, we record the last > mirrored offset (LMO) in the destination cluster. When failing back, > the LMO is used for truncating and then start mirroring the delta, > otherwise we start mirroring from scratch by truncating to zero. > > Retention: If the mirror leader attempts to fetch an offset that is > below the current log start offset of the source leader (e.g. fetching > offset 50 when log start offset is 100), the source broker returns an > OffsetOutOfRangeException that the mirror leader handles by truncating > to the source's current log start offset and resuming fetching from > that point. Compaction: The mirror leader replicates these compacted > log segments exactly as they exist in the source cluster, maintaining > the same offset assignments and gaps. > > Do you have any specific corner case in mind? > > AS2 > ------ > > Agreed. The current AlterShareGroupOffsetsRequest (v0) only includes > PartitionIndex and StartOffset with no epoch field. When mirroring > share group offsets across clusters, the epoch is needed to ensure the > offset alteration targets the correct leader generation. > > AS3 > ------ > > Right, the enum is now fixed. Yes, we will parse from the right and > apply the same naming rules used for topic name ;) > > AS4 > ------- > > Agreed. I'll try to improve those paragraphs because they are crucial > from an operational point of view. > > Let me shortly explain how it is supposed to work: > > 9091 (source) -----> 9094 (destination) > > The single operation that allows an operator to switch all topics at > once in case of disaster is the following: > > bin/kafka-mirror.sh --bootstrap-server :9094 --remove --topic .* > --mirror my-mirror > > 9091 (source) --x--> 9094 (destination) > > After that, all mirror topics become detached from the source cluster > and start accepting writes (the two cluster are allowed to diverge). > > When the source cluster is back, the operator can failback by creating > a mirror with the same name on the source cluster (new destination): > > echo "bootstrap.servers=localhost:9094" > /tmp/my-mirror.properties > bin/kafka-mirrors.sh --bootstrap-server :9091 --create --mirror > my-mirror --mirror-config /tmp/my-mirror.properties > bin/kafka-mirrors.sh --bootstrap-server :"9091 --add --topic .* > --mirror my-mirror > > 9091 (destination) <----- 9094 (source) > > AS5 > ------- > > This is the core of our design and we reached that empirically by > trying out different options. We didn't want to change local > replication, and this is something you need to do when preserving the > source leader epoch. The current design is simple and keeps the epoch > domains entirely separate. Destination cluster is in charge of the > leader epoch for its own log. The source epoch is only used during the > fetch protocol to validate responses and detect divergence. > > The polarity idea of tracking whether an epoch bump originated from > replication vs. local leadership change is interesting, but adds > significant complexity and coupling between source and destination > epochs. Could you clarify what specific scenario polarity tracking > would address that the current separation doesn't handle? One case we > don't support is unclean leader election reconciliation across > clusters, is that the gap you're aiming at? > > I tried to rewrite the unclean leader election paragraph in the > rejected alternatives to be easier to digest. Let me know if it works. > > On Tue, Feb 17, 2026 at 2:57 PM Andrew Schofield > <[email protected]> wrote: > > > > Hi Fede and friends, > > Thanks for the KIP. > > > > It’s a comprehensive design, easy to read and has clearly taken a lot of > > work. > > The principle of integrating mirroring into the brokers makes total sense > > to me. > > > > The main comment I have is that mirroring like this cannot handle situations > > in which multiple topic-partitions are logically related, such as > > transactions, > > with total fidelity. Each topic-partition is being replicated as a separate > > entity. > > The KIP calls this out and describes the behaviour thoroughly. > > > > A few initial comments. > > > > AS1) Is it true that offsets are always preserved by this KIP? I *think* so > > but > > not totally sure that it’s true in all cases. It would certainly be nice. > > > > AS2) I think you need to add epoch information to > > AlterShareGroupOffsetsRequest. > > It really should already be there in hindsight, but I think this KIP > > requires it. > > > > AS3) The CoordinatorType enum for MIRROR will need to be 3 because 2 is > > SHARE. > > I’m sure you’ll parse the keys from the right ;) > > > > AS4) The procedure for achieving a failover could be clearer. Let’s say > > that I am > > using cluster mirroring to achieve DR replication. My source cluster is > > utterly lost > > due to a disaster. What’s the single operation that I perform to switch all > > of the > > topics mirrored from the lost source cluster to become the active topics? > > Similarly for failback. > > > > AS5) The only piece that I’m really unsure of is the epoch management. I > > would > > have thought that the cluster which currently has the writable > > topic-partition > > would be in charge of the leader epoch and it would not be necessary to > > perform all of the gymnastics described in the section on epoch rewriting. > > I have read the Rejected Alternatives section too, but I don’t fully grasp > > why it was necessary to reject it. > > > > I wonder if we could store the “polarity” of an epoch, essentially whether > > the > > epoch bump was observed by replication from a source cluster, or whether > > it was bumped by a local leadership change when the topic is locally > > writable. > > When a topic-partition switches from read-only to writable, we should > > definitely > > bump the epoch, and we could record the fact that it was a local epoch. > > When connectivity is re-established, you might find that both ends have > > declared a local epoch N, but someone has to win. > > > > Thanks, > > Andrew > > > > > On 14 Feb 2026, at 07:17, Federico Valeri <[email protected]> wrote: > > > > > > Hi, we would like to start a discussion thread about KIP-1279: Cluster > > > Mirroring. > > > > > > Cluster Mirroring is a new Kafka feature that enables native, > > > broker-level topic replication across clusters. Unlike MirrorMaker 2 > > > (which runs as an external Connect-based tool), Cluster Mirroring is > > > built into the broker itself, allowing tighter integration with the > > > controller, coordinator, and partition lifecycle. > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1279%3A+Cluster+Mirroring > > > > > > There are a few missing bits, but most of the design is there, so we > > > think it is the right time to involve the community and get feedback. > > > Please help validating our approach. > > > > > > Thanks > > > Fede > > >
