Re: [DISCUSS] KIP-1279: Cluster Mirroring

Mickael Maison Fri, 03 Jul 2026 08:26:52 -0700

Hi,

Thanks for the KIP!
This will definitively be a great addition to Kafka.


MM1: In the description of the main components,
ClusterMirrorCoordinator has this responsibility:
"Metadata Refresh Scheduling: The coordinator schedules periodic
metadata refresh operations by invoking a metadata manager every 30
seconds by default. This ensures that configuration updates and group
offset commits in the source cluster are continuously propagated to
the destination cluster. The refresh interval is configurable via
mirror.metadata.refresh.interval.ms."
Is that accurate because MirrorMetadataManager has pretty much the
same responsibility.

MM2: About topic deletions, the KIP states: "the operator would need
to manually remove the topic from the mirror"
Is there a way to find topics that have been deleted in the source
cluster for an operator to do that action?

MM3: About group offsets: It states: "Groups with active members on
the destination are skipped."
Should these be logged too?

MM4: In the security control section, the DeleteClusterMirror RPC
requires Alter on the ClusterMirror resource. Should we use Delete
instead of Alter?
On non-cluster resources, we typically use Delete for deletion. See
https://kafka.apache.org/43/security/authorization-and-acls/#operations-and-resources-on-protocols

MM5: In Destination cluster permissions, I think FindCoordinator
should also need Describe on the Group resources for regular consumer
groups.

MM6: In Exactly-Once Semantics, it states: "The mirror fetcher thread
uses READ_UNCOMMITTED isolation"
Could this be configurable? If so we can do this in a follow-up KIP

MM7: It seems quotas are not mirrored between the clusters?
Is that omission intentional?

MM8: In StopMirrorTopicsRequest I'm not sure I understand what the
Patterns field is?
Is it a pattern for the topics to stop? If so the description seems odd

MM9: In 
StopMirrorTopicsResponse/StartMirrorTopicsResponse/PauseMirrorTopicsResponse/ResumeMirrorTopicsResponse
do we really need the mirror name?

MM10: In Configuration / Broker:
Can you clarify why we need sasl.mechanism.mirror.admin.protocol?
Can't we retrieve the protocol from mirror.admin.listener.name or
inter.broker.listener.name?
If we really need it, should it be named
mirror.admin.sasl.mechanism.protocol to match the other names?

MM11: In Configuration / Mirror
Should it be mirror.acls.include instead of mirror.acl.include to
match mirror.groups.include and mirror.topics.include?

MM12: Is the description of mirror.support.unclean.leader.election inverted?

MM13: The description of CLUSTER_MIRROR_NOT_EMPTY states: "The cluster
mirror still has active or non-removed topics"
Should it be "The cluster mirror still has active or non-stopped topics"?

MM14: We have the MIRROR_TOPIC_BEING_STOPPED error when stopping topic
already in stopping state? What happens when pausing topics already in
pausing state?
Related, I assume when stopping a stopped topic, you get
TOPIC_NOT_IN_CLUSTER_MIRROR

Thanks,
Mickael

On Mon, Jun 22, 2026 at 7:47 PM Andrew Schofield <[email protected]> wrote:
>
> Hi Fede,
> Thanks for your replies.
>
> AS30: Yes, that's right. Two different end markers for the same transaction, 
> and essentially unbounded duration of the inconsistency since KIP-939. I 
> agree that it's a class of transaction state divergence, like KAFKA-20716. 
> For KIP-1279, the most important thing is that consumers of the mirrored 
> topic must not get confused by the markers. For example, when the Fetch 
> response contains the list of aborted transactions, provided we are sure that 
> it can be built properly without confusion from the extra marker, then we are 
> good. It came to mind because the way this is represented in the protocol for 
> the responses is somewhat fragile I feel, so I'm a bit paranoid that a 
> mis-alignment could break it.
>
> Thanks,
> Andrew
>
> On 2026/06/19 12:16:52 Federico Valeri wrote:
> > Hi Andrew, me, again.
> >
> > Given that READ_COMMITTED consumers are unaffected because the ABORT
> > is already recorded in the transaction index and the dangling COMMIT
> > produces no CompletedTxn (no index and LSO updates), the practical
> > impact is limited. However, __transaction_state records the
> > transaction as COMMITTED while the partition log has it as aborted.
> > While analyzing this, we discovered that this type of divergence is
> > not unique to mirroring. KAFKA-20716 documents the same category of
> > inconsistency after unclean leader elections, where lost transaction
> > markers cause the LSO to get stuck (much bigger problem). We believe
> > that solving the transaction state divergence issue would also close
> > this mirroring scenario. Wdyt?
> >
> > On Thu, Jun 18, 2026 at 10:36 PM Federico Valeri <[email protected]> 
> > wrote:
> > >
> > > Hi Andrew, about AS30, I had a closer look at KIP-939, and I think
> > > this is what you mean:
> > >
> > > The problem is that a WriteTxnMarker request for a PID unknown to the
> > > ProducerStateManager (cleared by MIRROR_PID_RESET) does not fail. The
> > > end transaction marker is still appended, but transaction index and
> > > LSO are unaffected (ProducerStateManager.prepareUpdate,
> > > ProducerAppendInfo.appendEndTxnMarker). This may lead to a situation
> > > where we have two different end txn markers for the same transaction:
> > > ABORT from the stop mirroring operation, followed by a COMMIT from the
> > > local Txn Coordinator retrying its WriteTxnMarker after the partition
> > > becomes writable again.
> > >
> > > READ_COMMITTED consumers still see the earlier ABORT in the
> > > transaction index and skip the aborted data. The visible outcome for
> > > consumers is correct. However, __transaction_state on the original
> > > source cluster now records the transaction as COMMITTED while the
> > > partition log has it as aborted, creating a silent inconsistency that
> > > tools querying transaction state will not detect. The window for this
> > > race is bounded by transaction.max.timeout.ms (default 15 minutes),
> > > but becomes unbounded with KIP-939 because it allows prepared
> > > transactions to sit indefinitely.
> > >
> > > We could move the coordinator state to “complete abort” at the cost of
> > > an RPC. This would prevent any retry, but that would break the 2PC
> > > contract (Kafka must hold the prepared state until the external
> > > coordinator decides). Also, before stop there would be a number of
> > > retries constantly going on. We definitely need a better solution.
> > >
> > > On Thu, Jun 18, 2026 at 11:36 AM Federico Valeri <[email protected]> 
> > > wrote:
> > > >
> > > > Hi Andrew, thanks for your careful review.
> > > >
> > > > AS28: We reworked that section and added some diagrams to illustrate
> > > > problem and solutions. Let us know if it is better.
> > > >
> > > > AS30: We updated that table, but we are not sure about the problem you
> > > > mention. Can you clarify and send us the sequence of actions that
> > > > would lead to that problem?
> > > >
> > > > AS32: You are right. Replaced StartMirrorTopicsRequestData.TopicData
> > > > with a Controller.MirrorTopicMetadata record. Also renamed TopicData
> > > > to TopicMetadata in all schemas.
> > > >
> > > > AS33: Good idea. Added per-topic error messages in the RPC responses
> > > > you mentioned.
> > > >
> > > > AS29, AS31, AS34 (both of them), AS35, AS36: These small issues are
> > > > all fixed now.
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Jun 9, 2026 at 9:58 PM Andrew Schofield <[email protected]> 
> > > > wrote:
> > > > >
> > > > > Hi Fede and friends,
> > > > > This KIP is shaping up nicely. The change to the way that state 
> > > > > management for the mirrors is done is much more elegant. The removal 
> > > > > of the PID mapping is also a nice improvement.
> > > > >
> > > > > Inevitably, I have some more comments.
> > > > >
> > > > > AS28: I understand KAFKA-18723 and how leader epochs work in that 
> > > > > case, but the enhancements that you made to the Fetch RPC and how 
> > > > > they work escapes me. Please could we have some diagrams to 
> > > > > illustrate the various epochs?
> > > > >
> > > > > AS29: I like the addition of share group offset syncing. You need 
> > > > > AlterShareGroupOffsets and DescribeShareGroupOffsets in the tables of 
> > > > > RPCs and permissions.
> > > > >
> > > > > AS30: It is certainly true that EOS is not supported. Using cluster 
> > > > > mirroring on transactional data does risk breaking transactional 
> > > > > atomicity for transactions in flight at the time that mirroring was 
> > > > > stopped. That's a known limitation, and it's an effect of per-topic 
> > > > > async replication. Writing ABORT markers when stop mirroring is 
> > > > > triggered is interesting. I just wonder whether it's truly safe.
> > > > >
> > > > > Could you enhance the tables of records in the Exactly-Once Semantics 
> > > > > section to include leader epochs? I think that the leader epoch for 
> > > > > the ABORT markers will have been bumped as the mirroring stopped and 
> > > > > the partitions became writable. However, I worry that someone might 
> > > > > switch the mirroring back in the other direction before the 
> > > > > transaction which was rudely partially aborted has completed on the 
> > > > > original source cluster, and that the effect of the transaction 
> > > > > coordinator completing the transaction, potentially writing duplicate 
> > > > > transaction markers after the ABORT markers which will have been 
> > > > > replicated back again.
> > > > >
> > > > > This is probably only a realistic possibility when KIP-939 is 
> > > > > present, but it is accepted and I'm sure it will land soon enough. I 
> > > > > would say this KIP should be prepared for the existence of 
> > > > > long-running transactions so we don't dig a hole to fall into later.
> > > > >
> > > > > AS31: In the section on the Admin Client, the description for 
> > > > > Admin.createClusterMirror(String, Map, CreateClusterMirrorOptions) is 
> > > > > duplicated.
> > > > >
> > > > > AS32: The use of StartMirrorTopicsRequestData.TopicData in an 
> > > > > external interface looks odd.
> > > > >
> > > > > AS33: I suggest per-topic error messages in the RPC responses such as 
> > > > > PauseMirrorTopics so you can pass back more context, such as the 
> > > > > state if the request failed due to a state mismatch.
> > > > >
> > > > > AS34: There is an unnamed field in DescribeClusterMirrorsResponse. 
> > > > > The same in LastMirrorEpochsValue.
> > > > >
> > > > > AS34: Some of the fields in BumpLeaderEpochsRequest start with 
> > > > > lower-case letters. Also, there are quite a few fields in the RPCs in 
> > > > > general that should be version "0+" which are "0".
> > > > >
> > > > > AS35: Sometimes the partition states in the RPCs are string, but 
> > > > > sometimes int8.
> > > > >
> > > > > AS36: The base level for share group support is probably 4.1. All of 
> > > > > the RPCs were present in 4.1, although they were only enabled by 
> > > > > default in 4.2.
> > > > >
> > > > > Thanks,
> > > > > Andrew
> > > > >
> > > > > On 2026/06/04 11:01:36 Federico Valeri wrote:
> > > > > > Hello, additional updates and replies:
> > > > > >
> > > > > > VK12: Consumer data loss risk is a good point. We updated the "Group
> > > > > > Offsets" paragraph.
> > > > > >
> > > > > > Some of you raised the point that using mirror.name suffixes for 
> > > > > > pause
> > > > > > and stop operations is a bit inelegant and we agreed. Now we 
> > > > > > propose a
> > > > > > better approach:
> > > > > >
> > > > > > We replaced the topic config based approach (mirror.name and
> > > > > > .stopped/.paused suffix conventions) with a first class metadata
> > > > > > record for tracking mirror topic state changes. A new
> > > > > > MirrorTopicStateChangeRecord is added to the metadata log with three
> > > > > > fields: TopicId (uuid), MirrorName (string), and DesiredState (int8,
> > > > > > where 0 = start mirroring, 1 = stop, 2 = pause). When a user calls
> > > > > > startMirrorTopics, stopMirrorTopics, or pauseMirrorTopics, the
> > > > > > controller writes this record to the metadata log instead of
> > > > > > manipulating config suffixes. Brokers receive the record via 
> > > > > > metadata
> > > > > > updates and react accordingly, triggering partition level state
> > > > > > transitions through the existing MirrorPartitionState state machine.
> > > > > > More details in the updated KIP.
> > > > > >
> > > > > > Let us know what you think.
> > > > > >
> > > > > > Thanks
> > > > > > Fede
> > > > > >
> > > > > >
> > > > > > On Tue, Jun 2, 2026 at 8:50 AM Luke Chen <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi Viquar,
> > > > > > >
> > > > > > > Thanks for the comment.
> > > > > > >
> > > > > > > Regarding VK12, thanks for raising the issue.
> > > > > > > We didn't think about that.
> > > > > > > I agree that compared with data re-processing, it's worse to have 
> > > > > > > data loss.
> > > > > > > The proposed formula makes sense to me.
> > > > > > > *syncedOffset = max(destinationLogStartOffset, 
> > > > > > > min(sourceCommitted,
> > > > > > > destinationLEO))*
> > > > > > >
> > > > > > > Let us have some discussion internally and then update the KIP 
> > > > > > > and reply to
> > > > > > > the thread.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Luke
> > > > > > >
> > > > > > > On Wed, May 20, 2026 at 2:39 AM Federico Valeri 
> > > > > > > <[email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Rajini,
> > > > > > > >
> > > > > > > > RS14: Done. Missed that, sorry.
> > > > > > > >
> > > > > > > > Thanks again.
> > > > > > > >
> > > > > > > > On Tue, May 19, 2026 at 7:28 PM Rajini Sivaram 
> > > > > > > > <[email protected]>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Federico,
> > > > > > > > >
> > > > > > > > > Thanks for the updates. Just one minor point, apart from 
> > > > > > > > > that, looks
> > > > > > > > good.
> > > > > > > > >
> > > > > > > > > RS14: KIP still shows "bin/kafka-configs.sh 
> > > > > > > > > --bootstrap-server :9094
> > > > > > > > > --entity-type mirrors".
> > > > > > > > > Will be good to change that to `--entity-type 
> > > > > > > > > cluster-mirrors`.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Rajini
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, May 19, 2026 at 5:28 PM vaquar khan 
> > > > > > > > > <[email protected]>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Federico and team,
> > > > > > > > > >
> > > > > > > > > > Thank you for your detailed response on May 11, 2026. I 
> > > > > > > > > > greatly
> > > > > > > > appreciate
> > > > > > > > > > the collaborative effort over the past few months to harden 
> > > > > > > > > > the
> > > > > > > > KIP-1279
> > > > > > > > > > architecture.
> > > > > > > > > >
> > > > > > > > > > * 1. Resolved Architectural & Stability Vulnerabilities*
> > > > > > > > > >
> > > > > > > > > > I am pleased to confirm that the KIP has successfully 
> > > > > > > > > > integrated fixes
> > > > > > > > for
> > > > > > > > > > the critical vulnerabilities I flagged, thereby protecting 
> > > > > > > > > > the
> > > > > > > > cluster's
> > > > > > > > > > state machine, memory bounds, and control plane. Consider 
> > > > > > > > > > the following
> > > > > > > > > > items fully resolved on my end:
> > > > > > > > > >
> > > > > > > > > >    -
> > > > > > > > > >
> > > > > > > > > >    *VK4 / VK8 (Negative PID Bug & State Machine Failure):* 
> > > > > > > > > > I previously
> > > > > > > > > >    identified that the proposed -(sourceProducerId + 2) PID 
> > > > > > > > > > rewriting
> > > > > > > > > >    formula would fundamentally break hasProducerId() in
> > > > > > > > > >    AbstractRecordBatch.java, causing the broker to bypass
> > > > > > > > > >    ProducerStateManager.update() and default the Last 
> > > > > > > > > > Stable Offset
> > > > > > > > (LSO)
> > > > > > > > > >    to the High Watermark. I am glad to see the KIP authors 
> > > > > > > > > > abandoned
> > > > > > > > PID
> > > > > > > > > >    mapping entirely and adopted the deterministic 
> > > > > > > > > > MIRROR_PID_RESET
> > > > > > > > control
> > > > > > > > > >    record barrier. This perfectly protects Kafka's 
> > > > > > > > > > exactly-once
> > > > > > > > semantics.
> > > > > > > > > >
> > > > > > > > > >    -
> > > > > > > > > >
> > > > > > > > > >    *VK11 (TransactionIndex Rebuild Ambiguity):* Thank you 
> > > > > > > > > > for
> > > > > > > > officially
> > > > > > > > > >    confirming that the TransactionIndex is strictly rebuilt 
> > > > > > > > > > locally
> > > > > > > > during
> > > > > > > > > >    log appends rather than copied byte-for-byte. This 
> > > > > > > > > > resolves my
> > > > > > > > concern
> > > > > > > > > >    regarding PID mismatches inducing consumer read 
> > > > > > > > > > anomalies.
> > > > > > > > > >    -
> > > > > > > > > >
> > > > > > > > > >    *VK1 (Thundering Herd / OOM Heap Allocation):* Your 
> > > > > > > > > > clarification
> > > > > > > > that
> > > > > > > > > >    fetcher threads multiplex partitions meaning the memory 
> > > > > > > > > > footprint is
> > > > > > > > > >    strictly bounded by num_fetcher_threads * 
> > > > > > > > > > response_max_bytes rather
> > > > > > > > than
> > > > > > > > > >    a concurrent 1MB buffer per partition fully resolves my 
> > > > > > > > > > concern
> > > > > > > > > > regarding
> > > > > > > > > >    50GB broker-wide OOM spikes during mass partition 
> > > > > > > > > > wake-ups.
> > > > > > > > > >    -
> > > > > > > > > >
> > > > > > > > > >    *VK3 (Control Plane Hotspots):* I had flagged the severe 
> > > > > > > > > > risk of
> > > > > > > > > >    metadata saturation on a single broker during a "link 
> > > > > > > > > > flap" event.
> > > > > > > > Your
> > > > > > > > > >    confirmation that the __mirror_state topic utilizes a 
> > > > > > > > > > compound hash
> > > > > > > > > > of (mirrorName,
> > > > > > > > > >    topicId, partition number) mathematically ensures that 
> > > > > > > > > > the 50,000+
> > > > > > > > state
> > > > > > > > > >    transitions will be safely distributed across the 
> > > > > > > > > > cluster,
> > > > > > > > neutralizing
> > > > > > > > > > the
> > > > > > > > > >    single-node hotspot risk.
> > > > > > > > > >
> > > > > > > > > > 2. Outstanding Critical Blocker: VK12 (Offset Sync Data 
> > > > > > > > > > Loss)
> > > > > > > > > >
> > > > > > > > > > Regarding VK12, there is a fundamental misunderstanding in 
> > > > > > > > > > your
> > > > > > > > previous
> > > > > > > > > > reply. You stated: *"The scenario described can only occur 
> > > > > > > > > > if offsets
> > > > > > > > are
> > > > > > > > > > force-written to an active group, which the design 
> > > > > > > > > > prevents."*
> > > > > > > > > >
> > > > > > > > > > My concern has absolutely nothing to do with overwriting 
> > > > > > > > > > active
> > > > > > > > groups. My
> > > > > > > > > > concern applies strictly to the normal synchronization of 
> > > > > > > > > > inactive/dead
> > > > > > > > > > groups, and is based directly on the race condition 
> > > > > > > > > > currently
> > > > > > > > documented in
> > > > > > > > > > the official KIP-1279 text:
> > > > > > > > > >
> > > > > > > > > > *"During offset synchronization, the committed offset in the
> > > > > > > > destination
> > > > > > > > > > cluster may temporarily exceed the current log end offset 
> > > > > > > > > > (LEO) of the
> > > > > > > > > > mirror topic... consumers attempting to resume from offset 
> > > > > > > > > > 100 will
> > > > > > > > receive
> > > > > > > > > > an OffsetOutOfRangeException. To handle this gracefully, 
> > > > > > > > > > consumers
> > > > > > > > should
> > > > > > > > > > configure auto.offset.reset=latest..."*
> > > > > > > > > >
> > > > > > > > > > If a failover happens during this documented divergence 
> > > > > > > > > > window, a
> > > > > > > > > > reconnecting consumer will hit the 
> > > > > > > > > > OffsetOutOfRangeException. If the
> > > > > > > > > > downstream consumer follows the KIP's official advice and 
> > > > > > > > > > relies on
> > > > > > > > > > auto.offset.reset=latest, the consumer will jump to the 
> > > > > > > > > > absolute newest
> > > > > > > > > > offset on the partition. *This completely skips any newly 
> > > > > > > > > > produced
> > > > > > > > records
> > > > > > > > > > that arrived between the failover and the consumer 
> > > > > > > > > > reconnecting,
> > > > > > > > resulting
> > > > > > > > > > in silent, unrecoverable data loss for the downstream 
> > > > > > > > > > application.*
> > > > > > > > > > *Proposed Resolution: Double-Clamped Offset Safety 
> > > > > > > > > > Invariant* Instead
> > > > > > > > of
> > > > > > > > > > requiring consumers to use auto.offset.reset=latest and 
> > > > > > > > > > endorsing a
> > > > > > > > known
> > > > > > > > > > data loss vector, I propose that the 
> > > > > > > > > > ClusterMirrorCoordinator enforce a
> > > > > > > > > > pre-persist offset validation invariant during offset 
> > > > > > > > > > synchronization.
> > > > > > > > > > Before any translated offset is committed to the 
> > > > > > > > > > destination cluster,
> > > > > > > > the
> > > > > > > > > > coordinator must apply a double-clamp:
> > > > > > > > > >
> > > > > > > > > > *syncedOffset = max(destinationLSO, min(sourceCommitted,
> > > > > > > > destinationLEO))*
> > > > > > > > > >
> > > > > > > > > > Where destinationLSO is the Log Start Offset (earliest 
> > > > > > > > > > readable
> > > > > > > > position
> > > > > > > > > > post-retention) and destinationLEO is the Log End Offset 
> > > > > > > > > > (latest
> > > > > > > > replicated
> > > > > > > > > > position) of the destination partition.
> > > > > > > > > >
> > > > > > > > > > This guarantees that every persisted offset falls within 
> > > > > > > > > > the physically
> > > > > > > > > > valid range  eliminating both the OffsetOutOfRangeException 
> > > > > > > > > > caused by
> > > > > > > > > > exceeding the LEO during replication lag, and the 
> > > > > > > > > > expired-offset rewind
> > > > > > > > > > caused by falling below the LSO due to destination 
> > > > > > > > > > retention policies.
> > > > > > > > > >
> > > > > > > > > > The worst-case trade-off under this invariant is bounded 
> > > > > > > > > > re-processing
> > > > > > > > > > proportional to the replication lag at failover time not 
> > > > > > > > > > total
> > > > > > > > partition
> > > > > > > > > > depth which is perfectly consistent with Kafka's documented
> > > > > > > > at-least-once
> > > > > > > > > > delivery guarantees. Broad enterprise production evidence 
> > > > > > > > > > from
> > > > > > > > large-scale
> > > > > > > > > > cross-cluster failovers confirms that state checks alone 
> > > > > > > > > > (preventing
> > > > > > > > active
> > > > > > > > > > group overwrites) are insufficient; strict offset bounds 
> > > > > > > > > > clamping is
> > > > > > > > > > required to achieve enterprise-grade data integrity.
> > > > > > > > > >
> > > > > > > > > > I look forward to your thoughts on implementing this final 
> > > > > > > > > > offset
> > > > > > > > capping
> > > > > > > > > > logic. Once VK12 is patched, I believe this architecture 
> > > > > > > > > > will be
> > > > > > > > > > exceptionally robust and ready for enterprise deployment.
> > > > > > > > > >
> > > > > > > > > > Best Regards,
> > > > > > > > > >
> > > > > > > > > > Viquar Khan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, 18 May 2026 at 14:23, Rajini Sivaram 
> > > > > > > > > > <[email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Federico,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the updates! The KIP is looking good. A few 
> > > > > > > > > > > more small
> > > > > > > > > > comments.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > RS13: A couple of places still refer to `kafka-mirror.sh` 
> > > > > > > > > > > like under
> > > > > > > > > > > `Failover Process`. Can we change them to
> > > > > > > > `*kafka-cluster-mirrors.sh*`?
> > > > > > > > > > >
> > > > > > > > > > > RS14: Should we change `--entity-type mirrors` for 
> > > > > > > > > > > kafka-configs to
> > > > > > > > be `
> > > > > > > > > > > --entity-type *cluster-mirrors*` to be consistent? Also,
> > > > > > > > > > > CLUSTER_MIRROR((byte)
> > > > > > > > > > > 64, "mirror"); could be `*cluster-mirror*`?
> > > > > > > > > > >
> > > > > > > > > > > RS15: It may be useful to rename 
> > > > > > > > > > > `mirror.topic.num.partitions` and `
> > > > > > > > > > > mirror.topic.replication.factor` since they are very 
> > > > > > > > > > > similar to `
> > > > > > > > > > > mirror.topic.properties.exclude`, but the `mirror.topic` 
> > > > > > > > > > > prefix
> > > > > > > > refers to
> > > > > > > > > > > different topics (the internal topic for the first two 
> > > > > > > > > > > and actual
> > > > > > > > mirror
> > > > > > > > > > > topics for the other one).
> > > > > > > > > > >
> > > > > > > > > > > RS16: ACL Sync: KIP says "Deletes ACLs that exist in 
> > > > > > > > > > > destination but
> > > > > > > > not
> > > > > > > > > > in
> > > > > > > > > > > source using DeleteAcls request."
> > > > > > > > > > > What happens if someone creates an ACL on the destination 
> > > > > > > > > > > to deny
> > > > > > > > > > > User:Alice access to all topics?
> > > > > > > > > > >
> > > > > > > > > > >    1. If that ACL also existed on the source cluster and 
> > > > > > > > > > > then it was
> > > > > > > > > > >    removed, will it get removed from the destination?
> > > > > > > > > > >    2. If that ACL never existed on the source cluster, 
> > > > > > > > > > > will it get
> > > > > > > > > > removed
> > > > > > > > > > >    from the destination?
> > > > > > > > > > >
> > > > > > > > > > > RS17: The table in the Source ACLs section says:
> > > > > > > > "DescribeClusterMirrors
> > > > > > > > > > >    MC      Read    Cluster     Log truncation"
> > > > > > > > > > > Should that be `ClusterMirror:Read` instead of 
> > > > > > > > > > > `Cluster:Read`?
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > >
> > > > > > > > > > > Rajini
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 15, 2026 at 8:41 AM Federico Valeri <
> > > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Rajini, we finally addressed the API and tool naming
> > > > > > > > refactoring as
> > > > > > > > > > > > you suggested in RS6. Please take a look when you have 
> > > > > > > > > > > > time.
> > > > > > > > Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, May 11, 2026 at 6:17 PM Federico Valeri <
> > > > > > > > [email protected]>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hello all, I want to highlight a couple of new 
> > > > > > > > > > > > > paragraphs:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. Leader Epoch Invariant: Cluster mirroring enforces 
> > > > > > > > > > > > > the
> > > > > > > > invariant
> > > > > > > > > > > > > that the destination leader epoch must always be 
> > > > > > > > > > > > > greater than or
> > > > > > > > > > equal
> > > > > > > > > > > > > to the source leader epoch (DLE>=SLE). Without this, 
> > > > > > > > > > > > > consumers
> > > > > > > > on the
> > > > > > > > > > > > > destination cluster can get stuck in an infinite 
> > > > > > > > > > > > > metadata refresh
> > > > > > > > > > loop
> > > > > > > > > > > > > when they encounter committed offsets carrying source 
> > > > > > > > > > > > > epochs
> > > > > > > > higher
> > > > > > > > > > > > > than the local epoch. The invariant is maintained 
> > > > > > > > > > > > > through three
> > > > > > > > > > > > > mechanisms: reactive bumping (epoch fencing triggered 
> > > > > > > > > > > > > when SLE >
> > > > > > > > DLE
> > > > > > > > > > > > > during fetch), proactive bumping (scheduled when SLE 
> > > > > > > > > > > > > approaches
> > > > > > > > DLE
> > > > > > > > > > > > > within a threshold), and periodic bumping (checked 
> > > > > > > > > > > > > during
> > > > > > > > coordinator
> > > > > > > > > > > > > metadata sync).
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620973#KIP1279:ClusterMirroring-LeaderEpochInvariant
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. Group Offsets: The coordinator periodically syncs 
> > > > > > > > > > > > > consumer and
> > > > > > > > > > > > > share group offsets from the source cluster to the 
> > > > > > > > > > > > > destination
> > > > > > > > for
> > > > > > > > > > all
> > > > > > > > > > > > > mirrored topics. Groups are filtered by configurable
> > > > > > > > include/exclude
> > > > > > > > > > > > > patterns, and offsets are only synced for groups that 
> > > > > > > > > > > > > are not
> > > > > > > > > > > > > currently active on the destination cluster, 
> > > > > > > > > > > > > preventing
> > > > > > > > overwrites of
> > > > > > > > > > > > > local consumer progress. Because source and 
> > > > > > > > > > > > > destination share the
> > > > > > > > > > same
> > > > > > > > > > > > > topic offsets (no offset translation), synced offsets 
> > > > > > > > > > > > > can be used
> > > > > > > > > > > > > directly without mapping.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620973#KIP1279:ClusterMirroring-GroupOffsets
> > > > > > > > > > > > >
> > > > > > > > > > > > > These new paragraphs directly address some of your 
> > > > > > > > > > > > > questions,
> > > > > > > > but let
> > > > > > > > > > > > > me list them here:
> > > > > > > > > > > > >
> > > > > > > > > > > > > JR2: Yes, we removed the incorrect phrase and added 
> > > > > > > > > > > > > more details
> > > > > > > > to
> > > > > > > > > > > > > the paragraph.
> > > > > > > > > > > > >
> > > > > > > > > > > > > JR4: When source cluster topic has tiered storage 
> > > > > > > > > > > > > enabled, CM
> > > > > > > > works
> > > > > > > > > > by
> > > > > > > > > > > > > mirroring remote and local log into destination 
> > > > > > > > > > > > > cluster. When
> > > > > > > > > > > > > destination cluster topic has tiered storage enabled, 
> > > > > > > > > > > > > CM fails in
> > > > > > > > > > > > > PREPARING state because the LME may be in remote 
> > > > > > > > > > > > > storage, but
> > > > > > > > works
> > > > > > > > > > > > > fine if already MIRRORING because no truncation is 
> > > > > > > > > > > > > needed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > JR11: See "Leader Epoch Invariant" paragraph 
> > > > > > > > > > > > > mentioned above.
> > > > > > > > > > > > >
> > > > > > > > > > > > > JR13: We can't support stateful Streams application 
> > > > > > > > > > > > > because
> > > > > > > > > > > > > asynchronous replication cannot preserve the 
> > > > > > > > > > > > > transactional
> > > > > > > > boundaries
> > > > > > > > > > > > > between input offset commits, state store mutations 
> > > > > > > > > > > > > written to
> > > > > > > > > > > > > changelog topics, and intermediate records written to 
> > > > > > > > > > > > > repartition
> > > > > > > > > > > > > topics. The synchronous extension of this design will 
> > > > > > > > > > > > > be able to
> > > > > > > > > > > > > support them. Existing Features Integration paragraph 
> > > > > > > > > > > > > updated.
> > > > > > > > > > > > >
> > > > > > > > > > > > > JR18: See "Group Offsets" paragraph mentioned above.
> > > > > > > > > > > > >
> > > > > > > > > > > > > IY1: See "Group Offsets" paragraph mentioned above.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > Fede
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, May 11, 2026 at 6:08 PM Federico Valeri <
> > > > > > > > > > [email protected]>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Vaquar,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > VK4/VK8: We don't do PID mapping anymore. The KIP 
> > > > > > > > > > > > > > was updated
> > > > > > > > some
> > > > > > > > > > > > > > time ago with the new approach based on the new PID 
> > > > > > > > > > > > > > reset
> > > > > > > > control
> > > > > > > > > > > > > > record.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > VK11: The transaction index is always built locally 
> > > > > > > > > > > > > > during log
> > > > > > > > > > > append,
> > > > > > > > > > > > > > never copied.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > VK1: The 50,000 * 1MB = 50GB calculation 
> > > > > > > > > > > > > > misunderstands the
> > > > > > > > fetch
> > > > > > > > > > > > > > model. Fetcher threads don't allocate one buffer 
> > > > > > > > > > > > > > per partition.
> > > > > > > > > > > Actual
> > > > > > > > > > > > > > peak memory is roughly num_fetcher_threads *
> > > > > > > > response_max_bytes,
> > > > > > > > > > not
> > > > > > > > > > > > > > num_partitions * partition_max_bytes. With 1 
> > > > > > > > > > > > > > fetcher thread
> > > > > > > > and the
> > > > > > > > > > > > > > default response max, the memory footprint is modest
> > > > > > > > regardless of
> > > > > > > > > > > > > > partition count. We are leveraging the same proven 
> > > > > > > > > > > > > > pattern
> > > > > > > > used by
> > > > > > > > > > > the
> > > > > > > > > > > > > > internal replication.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > VK3: The __mirror_state topic uses hash-based 
> > > > > > > > > > > > > > partitioning
> > > > > > > > based on
> > > > > > > > > > > > > > mirrorName, topicId and partition number. With the 
> > > > > > > > > > > > > > production
> > > > > > > > > > default
> > > > > > > > > > > > > > of 50 partitions, 50,000 partition transitions 
> > > > > > > > > > > > > > distribute
> > > > > > > > across
> > > > > > > > > > ~50
> > > > > > > > > > > > > > partition leaders on different brokers, not a 
> > > > > > > > > > > > > > single broker.
> > > > > > > > This
> > > > > > > > > > is
> > > > > > > > > > > > > > the same proven pattern as __consumer_offsets, 
> > > > > > > > > > > > > > which handles
> > > > > > > > > > millions
> > > > > > > > > > > > > > of commits.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > VK12: The scenario described can only occur if 
> > > > > > > > > > > > > > offsets are
> > > > > > > > > > > > > > force-written to an active group, which the design 
> > > > > > > > > > > > > > prevents.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers
> > > > > > > > > > > > > > Fede
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> >

Re: [DISCUSS] KIP-1279: Cluster Mirroring

Reply via email to