+1 (binding)

> Satish Duggana <[email protected]> 於 2026年2月28日 上午11:00 寫道:
> 
> Thanks for the KIP.
> I've reviewed the updated KIP and agree with the motivation behind
> KIP-1150, overall LGTM.
> It seems KIP-1163 and KIP-1164 require more details, which we can discuss
> in those respective threads.
> 
> +1(binding) for KIP-1150.
> 
> ~Satish.
> 
>> On Fri, 27 Feb 2026 at 23:28, Jun Rao via dev <[email protected]> wrote:
>> 
>> Hi, Anatolii,
>> 
>> Thanks for the KIP. The link you posted for KIP-1150 seems incorrect and it
>> points to KIP-1163. Otherwise, +1.
>> 
>> Jun
>> 
>>> On Wed, Feb 25, 2026 at 2:59 PM vaquar khan <[email protected]> wrote:
>>> 
>>> Fair point, Chris. I agree with that architectural boundary. KIP-1150
>>> successfully sets the high-level mandate , and we can rigorously tackle
>> the
>>> exact EOS and RPC mechanics over in the KIP-1164 thread .
>>> 
>>> Andrew, I am fully aligned with you on the massive operational value of
>>> eliminating those cross-AZ replication costs. It is absolutely the right
>>> strategic direction for Kafka.
>>> 
>>> Since my initial concerns on the storage side are resolved, and we are
>>> aligned on where the transactional interfaces will be finalized, I am
>>> officially withdrawing my objection.
>>> +1 (non-binding) for KIP-1150.
>>> 
>>> I will migrate my open questions over to the KIP-1164 discussion thread
>> so
>>> we can lock down the data safety details there.
>>> 
>>> Regards,
>>> Vaquar Khan
>>> 
>>> On Wed, 25 Feb 2026 at 15:24, Chris Egerton <[email protected]>
>>> wrote:
>>> 
>>>> Hi Vaquar,
>>>> 
>>>>> Let me know what you guys think about locking down the text for these
>>>> interfaces.
>>>> 
>>>> I think this KIP has the appropriate level of detail and any concerns
>>> about
>>>> EOS can be addressed in the relevant sub-KIP.
>>>> 
>>>> Chris
>>>> 
>>>> On Wed, Feb 25, 2026 at 4:20 PM vaquar khan <[email protected]>
>>> wrote:
>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> First off, thanks to the authors for the Feb 12th updates to
>> KIP-1163 .
>>>>> Adding the periodic reconciliation loop clears up my concerns about
>> the
>>>>> orphaned "Upload-then-Commit" segments, so I'm officially withdrawing
>>> my
>>>>> objection on the storage leak issue .
>>>>> 
>>>>> Chris and Greg- since you both mentioned digging into the 1164
>>> details, I
>>>>> wanted to pick your brains on how Exactly-Once Semantics (EOS) is
>> going
>>>> to
>>>>> safely operate here. In standard Kafka, the Partition Leader is our
>>>> single
>>>>> serialization point. It receives the data, tracks ongoing
>> transactions
>>>> via
>>>>> the ProducerStateManager, and calculates the Last Stable Offset (LSO)
>>>>> locally . Since KIP-1150 removes the leader, the Batch Coordinator
>>> takes
>>>>> over. But as I read through the current text, a few critical
>>>>> synchronization barriers seem to be missing to me:
>>>>> 
>>>>> 1. LSO Calculation: How exactly will the Batch Coordinator maintain
>> and
>>>>> calculate the LSO? Justine Olshan brought this up earlier too . Will
>>> the
>>>>> coordinator run its own ProducerStateManager to track ongoing
>>>> transactions,
>>>>> or is there a totally different state machine planned?
>>>>> 
>>>>> 2. RPC Protocol: What's the exact synchronization protocol between
>> the
>>>>> legacy Transaction Coordinator and the new Batch Coordinator? When
>> the
>>>> Txn
>>>>> Coordinator sends a commit marker, how does the Batch Coordinator
>>>> actually
>>>>> verify it has received all the prerequisite data batches for that
>>>> specific
>>>>> transaction epoch?
>>>>> 
>>>>> 3. Delayed Data Race Condition: Let's say a broker hits a GC pause
>>> right
>>>>> *after
>>>>> *uploading a batch to object storage, but *before* committing the
>>>>> coordinates . If the transaction commit marker arrives at the
>>> Coordinator
>>>>> first, what happens? Does the Coordinator wait? If not, couldn't the
>>>>> transaction commit with missing data, completely violating
>>> read_committed
>>>>> isolation?
>>>>> 
>>>>> The KIP vaguely mentions *transactional checks* but leaves the actual
>>>>> commit protocol and public interfaces undefined right now . I'm not
>>>> saying
>>>>> the design itself is broken, but I really think myself and others
>> need
>>> to
>>>>> see these RPC flows explicitly documented before we implement and
>>> adopt
>>>>> this. Otherwise, we risk baking in some severe data isolation
>> headaches
>>>>> down the line.
>>>>> 
>>>>> Let me know what you guys think about locking down the text for these
>>>>> interfaces.
>>>>> 
>>>>> Regards,
>>>>> Vaquar Khan
>>>>> 
>>>>> On Wed, 25 Feb 2026 at 10:33, Greg Harris via dev <
>>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hey all,
>>>>>> 
>>>>>> I'm excited to discuss more details in 1163 and 1164 with everyone.
>>>>>> 
>>>>>> +1 (binding)
>>>>>> 
>>>>>> Thanks!
>>>>>> Greg
>>>>>> 
>>>>>> On Wed, Feb 25, 2026 at 1:08 AM Anatolii Popov via dev <
>>>>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> Given the importance of this KIP, we want to keep the vote open
>>> for a
>>>>> few
>>>>>>> more days to give time to people who had comments in the DISCUSS
>>>> thread
>>>>>> to
>>>>>>> cast their vote if they want.
>>>>>>> 
>>>>>>> On Wed, Feb 25, 2026 at 10:47 AM Josep Prat via dev <
>>>>>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> As a co-author of the KIP, I want to explicitly cast my vote
>> for
>>>> this
>>>>>>> KIP.
>>>>>>>> 
>>>>>>>> +1 (binding)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Feb 25, 2026 at 9:02 AM Luke Chen <[email protected]>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> I've re-read KIP-1150, and still agree this is what we need
>> for
>>>>>> Apache
>>>>>>>>> Kafka.
>>>>>>>>> 
>>>>>>>>> +1 (binding) from me.
>>>>>>>>> 
>>>>>>>>> Thank you,
>>>>>>>>> Luke
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 25, 2026 at 12:10 PM Chris Egerton <
>>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> Thanks for the KIP. I've reviewed 1150, 1163, and 1164, as
>>> well
>>>> as
>>>>>> the
>>>>>>>>>> relevant discussion threads. I may have granular comments
>>> about
>>>>> 1163
>>>>>>> and
>>>>>>>>>> 1164 but the overall approach suggested in 1150 looks good
>> to
>>>> me.
>>>>> I
>>>>>>>>>> especially like that the approach covers two main pain
>> points
>>> of
>>>>>>>> operating
>>>>>>>>>> and paying for Kafka today: it allows cross-AZ traffic to be
>>>>> reduced
>>>>>>>> (even
>>>>>>>>>> eliminated in some cases), and it also allows local disk
>> usage
>>>> by
>>>>>>>> brokers
>>>>>>>>>> to be reduced (if operators opt for a small local cache on
>>>>> follower
>>>>>>>>>> brokers
>>>>>>>>>> for non-tiered segments).
>>>>>>>>>> 
>>>>>>>>>> +1 (binding)
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> 
>>>>>>>>>> Chris
>>>>>>>>>> 
>>>>>>>>>> On Mon, Jan 26, 2026 at 3:36 PM vaquar khan <
>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Josep,
>>>>>>>>>>> 
>>>>>>>>>>> Thank you for the detailed response. I appreciate the
>>>>>> clarification
>>>>>>>>>>> regarding the distinction between the Inkless POC and the
>>> KIP
>>>>>>> design.
>>>>>>>>>>> 
>>>>>>>>>>> However, my objection is not based on temporary bugs in
>> the
>>>>> fork,
>>>>>>> but
>>>>>>>>>> *on
>>>>>>>>>>> architectural gaps in the KIPs themselves* that these
>>>>>> implementation
>>>>>>>>>> issues
>>>>>>>>>>> highlighted. If we are voting to approve the design, the
>>>> design
>>>>>>>>>> documents
>>>>>>>>>>> must be structurally complete regarding data safety.
>>>>>>>>>>> 
>>>>>>>>>>> *1. Regarding Storage Leaks (The Missing Design)* You
>>>> mentioned
>>>>>> that
>>>>>>>>>>> cleanup logic "can be defined later." However, KIP-1163
>>>>> explicitly
>>>>>>>>>>> delegates this responsibility to a separate process, and
>>>>> KIP-1165
>>>>>>>>>> (Object
>>>>>>>>>>> Compaction/GC) is currently marked as "Discarded" in the
>>> wiki.
>>>>>>>>>>> 
>>>>>>>>>>> We cannot vote to approve a storage engine that has no
>>>> specified
>>>>>>>>>> mechanism
>>>>>>>>>>> for garbage collection. The "Upload-then-Commit" pattern
>>>>> described
>>>>>>> in
>>>>>>>>>>> KIP-1163 structurally creates orphaned segments during
>>> broker
>>>>>>>> failures.
>>>>>>>>>>> Without an active KIP defining the reconciliation protocol
>>>>> (since
>>>>>>>>>> KIP-1165
>>>>>>>>>>> was withdrawn), the proposal effectively describes a
>> system
>>>> with
>>>>>>>>>> unbounded
>>>>>>>>>>> storage growth during failure modes. This is a blocking
>>> design
>>>>>> gap,
>>>>>>>> not
>>>>>>>>>> an
>>>>>>>>>>> implementation detail.
>>>>>>>>>>> 
>>>>>>>>>>> *2. Regarding EOS (The Coordinator Synchronization Gap)*
>>> This
>>>> is
>>>>>>> not a
>>>>>>>>>>> misunderstanding of standard Kafka transactions; it is a
>>>>> critique
>>>>>> of
>>>>>>>> how
>>>>>>>>>>> KIP-1150 changes them. Standard EOS relies on the
>> Partition
>>>>> Leader
>>>>>>> to
>>>>>>>>>>> sequence markers and calculate the LSO (Last Stable
>> Offset)
>>> in
>>>>>>> memory.
>>>>>>>>>>> KIP-1150 removes the Leader.
>>>>>>>>>>> 
>>>>>>>>>>> KIP-1164 (Batch Coordinator) must explicitly define the
>> RPC
>>>> flow
>>>>>>>> between
>>>>>>>>>>> the Transaction Coordinator and the Batch Coordinator to
>>>> replace
>>>>>> the
>>>>>>>>>>> leader's role. Currently, the KIP does not specify how the
>>>>> system
>>>>>>>>>> prevents
>>>>>>>>>>> a "Split Brain" scenario where a consumer reads ahead of a
>>>>>>> transaction
>>>>>>>>>>> marker that hasn't yet been sequenced by the Batch
>>>> Coordinator.
>>>>>> This
>>>>>>>> is
>>>>>>>>>> a
>>>>>>>>>>> protocol-level correctness issue that must be resolved in
>>> the
>>>>> text
>>>>>>>>>> before
>>>>>>>>>>> adoption.
>>>>>>>>>>> 
>>>>>>>>>>> Please note - I am maintaining my objection based on
>> missing
>>>>>>>>>>> specifications, not code bugs.
>>>>>>>>>>> 
>>>>>>>>>>> I respectfully request that we pause the vote until:
>>>>>>>>>>> 
>>>>>>>>>>>    A valid design for Garbage Collection (replacing the
>>>>> discarded
>>>>>>>>>>> KIP-1165) is added to the proposal.
>>>>>>>>>>> 
>>>>>>>>>>>    The Transaction/LSO synchronization protocol is
>>> explicitly
>>>>>>>>>> documented
>>>>>>>>>>> in KIP-1164.
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> 
>>>>>>>>>>> Vaquar Khan
>>>>>>>>>>> Sr Data Architect
>>>>>>>>>>> https://www.linkedin.com/in/vaquar-khan-b695577/
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> [image: Aiven] <https://www.aiven.io>
>>>>>>>> 
>>>>>>>> *Josep Prat*
>>>>>>>> Sr. Engineering Director, Streaming Services, *Aiven*
>>>>>>>> [email protected]   |   +491715557497
>>>>>>>> aiven.io <https://www.aiven.io>   |   <
>>>>>>> https://www.facebook.com/aivencloud
>>>>>>>>> 
>>>>>>>>  <https://www.linkedin.com/company/aiven/>   <
>>>>>>>> https://twitter.com/aiven_io>
>>>>>>>> *Aiven Deutschland GmbH*
>>>>>>>> Alexanderufer 3-7, 10117 Berlin
>>>>>>>> 
>>>>>>>> Geschäftsführer: Oskari Saarenmaa, Kenneth Chen
>>>>>>>> Amtsgericht Charlottenburg, HRB 209739 B
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Anatolii Popov
>>>>>>> Senior Software Developer, *Aiven OY*
>>>>>>> m: +358505126242
>>>>>>> w: aiven.io  e: [email protected]
>>>>>>> <https://www.facebook.com/aivencloud>
>>>>>>> <https://www.linkedin.com/company/aiven/>   <
>>>>>> https://twitter.com/aiven_io>
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to