Fair point, Chris. I agree with that architectural boundary. KIP-1150
successfully sets the high-level mandate , and we can rigorously tackle the
exact EOS and RPC mechanics over in the KIP-1164 thread .

Andrew, I am fully aligned with you on the massive operational value of
eliminating those cross-AZ replication costs. It is absolutely the right
strategic direction for Kafka.

Since my initial concerns on the storage side are resolved, and we are
aligned on where the transactional interfaces will be finalized, I am
officially withdrawing my objection.
+1 (non-binding) for KIP-1150.

I will migrate my open questions over to the KIP-1164 discussion thread so
we can lock down the data safety details there.

Regards,
Vaquar Khan

On Wed, 25 Feb 2026 at 15:24, Chris Egerton <[email protected]> wrote:

> Hi Vaquar,
>
> > Let me know what you guys think about locking down the text for these
> interfaces.
>
> I think this KIP has the appropriate level of detail and any concerns about
> EOS can be addressed in the relevant sub-KIP.
>
> Chris
>
> On Wed, Feb 25, 2026 at 4:20 PM vaquar khan <[email protected]> wrote:
>
> > Hi everyone,
> >
> > First off, thanks to the authors for the Feb 12th updates to KIP-1163 .
> > Adding the periodic reconciliation loop clears up my concerns about the
> > orphaned "Upload-then-Commit" segments, so I'm officially withdrawing my
> > objection on the storage leak issue .
> >
> > Chris and Greg- since you both mentioned digging into the 1164 details, I
> > wanted to pick your brains on how Exactly-Once Semantics (EOS) is going
> to
> > safely operate here. In standard Kafka, the Partition Leader is our
> single
> > serialization point. It receives the data, tracks ongoing transactions
> via
> > the ProducerStateManager, and calculates the Last Stable Offset (LSO)
> > locally . Since KIP-1150 removes the leader, the Batch Coordinator takes
> > over. But as I read through the current text, a few critical
> > synchronization barriers seem to be missing to me:
> >
> > 1. LSO Calculation: How exactly will the Batch Coordinator maintain and
> > calculate the LSO? Justine Olshan brought this up earlier too . Will the
> > coordinator run its own ProducerStateManager to track ongoing
> transactions,
> > or is there a totally different state machine planned?
> >
> > 2. RPC Protocol: What's the exact synchronization protocol between the
> > legacy Transaction Coordinator and the new Batch Coordinator? When the
> Txn
> > Coordinator sends a commit marker, how does the Batch Coordinator
> actually
> > verify it has received all the prerequisite data batches for that
> specific
> > transaction epoch?
> >
> > 3. Delayed Data Race Condition: Let's say a broker hits a GC pause right
> > *after
> > *uploading a batch to object storage, but *before* committing the
> > coordinates . If the transaction commit marker arrives at the Coordinator
> > first, what happens? Does the Coordinator wait? If not, couldn't the
> > transaction commit with missing data, completely violating read_committed
> > isolation?
> >
> > The KIP vaguely mentions *transactional checks* but leaves the actual
> > commit protocol and public interfaces undefined right now . I'm not
> saying
> > the design itself is broken, but I really think myself and others need to
> > see these RPC flows explicitly documented before we implement and  adopt
> > this. Otherwise, we risk baking in some severe data isolation headaches
> > down the line.
> >
> > Let me know what you guys think about locking down the text for these
> > interfaces.
> >
> > Regards,
> > Vaquar Khan
> >
> > On Wed, 25 Feb 2026 at 10:33, Greg Harris via dev <[email protected]>
> > wrote:
> >
> > > Hey all,
> > >
> > > I'm excited to discuss more details in 1163 and 1164 with everyone.
> > >
> > > +1 (binding)
> > >
> > > Thanks!
> > > Greg
> > >
> > > On Wed, Feb 25, 2026 at 1:08 AM Anatolii Popov via dev <
> > > [email protected]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Given the importance of this KIP, we want to keep the vote open for a
> > few
> > > > more days to give time to people who had comments in the DISCUSS
> thread
> > > to
> > > > cast their vote if they want.
> > > >
> > > > On Wed, Feb 25, 2026 at 10:47 AM Josep Prat via dev <
> > > [email protected]>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > > As a co-author of the KIP, I want to explicitly cast my vote for
> this
> > > > KIP.
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > >
> > > > > On Wed, Feb 25, 2026 at 9:02 AM Luke Chen <[email protected]>
> wrote:
> > > > >
> > > > > > I've re-read KIP-1150, and still agree this is what we need for
> > > Apache
> > > > > > Kafka.
> > > > > >
> > > > > > +1 (binding) from me.
> > > > > >
> > > > > > Thank you,
> > > > > > Luke
> > > > > >
> > > > > > On Wed, Feb 25, 2026 at 12:10 PM Chris Egerton <
> > > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> Thanks for the KIP. I've reviewed 1150, 1163, and 1164, as well
> as
> > > the
> > > > > >> relevant discussion threads. I may have granular comments about
> > 1163
> > > > and
> > > > > >> 1164 but the overall approach suggested in 1150 looks good to
> me.
> > I
> > > > > >> especially like that the approach covers two main pain points of
> > > > > operating
> > > > > >> and paying for Kafka today: it allows cross-AZ traffic to be
> > reduced
> > > > > (even
> > > > > >> eliminated in some cases), and it also allows local disk usage
> by
> > > > > brokers
> > > > > >> to be reduced (if operators opt for a small local cache on
> > follower
> > > > > >> brokers
> > > > > >> for non-tiered segments).
> > > > > >>
> > > > > >> +1 (binding)
> > > > > >>
> > > > > >> Cheers,
> > > > > >>
> > > > > >> Chris
> > > > > >>
> > > > > >> On Mon, Jan 26, 2026 at 3:36 PM vaquar khan <
> > [email protected]>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Josep,
> > > > > >> >
> > > > > >> > Thank you for the detailed response. I appreciate the
> > > clarification
> > > > > >> > regarding the distinction between the Inkless POC and the KIP
> > > > design.
> > > > > >> >
> > > > > >> > However, my objection is not based on temporary bugs in the
> > fork,
> > > > but
> > > > > >> *on
> > > > > >> > architectural gaps in the KIPs themselves* that these
> > > implementation
> > > > > >> issues
> > > > > >> > highlighted. If we are voting to approve the design, the
> design
> > > > > >> documents
> > > > > >> > must be structurally complete regarding data safety.
> > > > > >> >
> > > > > >> > *1. Regarding Storage Leaks (The Missing Design)* You
> mentioned
> > > that
> > > > > >> > cleanup logic "can be defined later." However, KIP-1163
> > explicitly
> > > > > >> > delegates this responsibility to a separate process, and
> > KIP-1165
> > > > > >> (Object
> > > > > >> > Compaction/GC) is currently marked as "Discarded" in the wiki.
> > > > > >> >
> > > > > >> > We cannot vote to approve a storage engine that has no
> specified
> > > > > >> mechanism
> > > > > >> > for garbage collection. The "Upload-then-Commit" pattern
> > described
> > > > in
> > > > > >> > KIP-1163 structurally creates orphaned segments during broker
> > > > > failures.
> > > > > >> > Without an active KIP defining the reconciliation protocol
> > (since
> > > > > >> KIP-1165
> > > > > >> > was withdrawn), the proposal effectively describes a system
> with
> > > > > >> unbounded
> > > > > >> > storage growth during failure modes. This is a blocking design
> > > gap,
> > > > > not
> > > > > >> an
> > > > > >> > implementation detail.
> > > > > >> >
> > > > > >> > *2. Regarding EOS (The Coordinator Synchronization Gap)* This
> is
> > > > not a
> > > > > >> > misunderstanding of standard Kafka transactions; it is a
> > critique
> > > of
> > > > > how
> > > > > >> > KIP-1150 changes them. Standard EOS relies on the Partition
> > Leader
> > > > to
> > > > > >> > sequence markers and calculate the LSO (Last Stable Offset) in
> > > > memory.
> > > > > >> > KIP-1150 removes the Leader.
> > > > > >> >
> > > > > >> > KIP-1164 (Batch Coordinator) must explicitly define the RPC
> flow
> > > > > between
> > > > > >> > the Transaction Coordinator and the Batch Coordinator to
> replace
> > > the
> > > > > >> > leader's role. Currently, the KIP does not specify how the
> > system
> > > > > >> prevents
> > > > > >> > a "Split Brain" scenario where a consumer reads ahead of a
> > > > transaction
> > > > > >> > marker that hasn't yet been sequenced by the Batch
> Coordinator.
> > > This
> > > > > is
> > > > > >> a
> > > > > >> > protocol-level correctness issue that must be resolved in the
> > text
> > > > > >> before
> > > > > >> > adoption.
> > > > > >> >
> > > > > >> > Please note - I am maintaining my objection based on missing
> > > > > >> > specifications, not code bugs.
> > > > > >> >
> > > > > >> > I respectfully request that we pause the vote until:
> > > > > >> >
> > > > > >> >     A valid design for Garbage Collection (replacing the
> > discarded
> > > > > >> > KIP-1165) is added to the proposal.
> > > > > >> >
> > > > > >> >     The Transaction/LSO synchronization protocol is explicitly
> > > > > >> documented
> > > > > >> > in KIP-1164.
> > > > > >> >
> > > > > >> > Regards,
> > > > > >> >
> > > > > >> > Vaquar Khan
> > > > > >> > Sr Data Architect
> > > > > >> > https://www.linkedin.com/in/vaquar-khan-b695577/
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > > > --
> > > > > [image: Aiven] <https://www.aiven.io>
> > > > >
> > > > > *Josep Prat*
> > > > > Sr. Engineering Director, Streaming Services, *Aiven*
> > > > > [email protected]   |   +491715557497
> > > > > aiven.io <https://www.aiven.io>   |   <
> > > > https://www.facebook.com/aivencloud
> > > > > >
> > > > >   <https://www.linkedin.com/company/aiven/>   <
> > > > > https://twitter.com/aiven_io>
> > > > > *Aiven Deutschland GmbH*
> > > > > Alexanderufer 3-7, 10117 Berlin
> > > > >
> > > > > Geschäftsführer: Oskari Saarenmaa, Kenneth Chen
> > > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > > >
> > > >
> > > >
> > > > --
> > > > Anatolii Popov
> > > > Senior Software Developer, *Aiven OY*
> > > > m: +358505126242
> > > > w: aiven.io  e: [email protected]
> > > > <https://www.facebook.com/aivencloud>
> > > > <https://www.linkedin.com/company/aiven/>   <
> > > https://twitter.com/aiven_io>
> > > >
> > >
> >
>

Reply via email to