Hi all, Thanks everyone for the votes and discussion. I'm happy to announce that KIP-1150 is now accepted with +9 binding (Stanislav Kozlovski, Chris Egerton, Luke Chen, Josep Prat, Greg Harris, Andrew Schofield, Jun Rao, Satish Duggana, Chia-Ping Tsai) votes and +5 non-binding(Henry Cai, Jian Fu, Andrew Mills, Varun Ghai, Vaquar Khan) votes.
As a reminder, KIP-1150 is a motivational KIP — it establishes community consensus on whether Apache Kafka should pursue object storage as the primary storage backend for a new topic type, without prescribing a specific implementation. All implementation details will be defined and discussed in the follow-up KIPs: - KIP-1163: Diskless Core — DISCUSS thread : https://lists. apache.org/thread/3dj67w04r7pcmlytl912gv69j22o3g4j - KIP-1164: Diskless Coordinator — DISCUSS thrread: https://lists.apache.org/thread/m9l6lbqv2cffxtz5frypylmqjd7bsqoz We encourage everyone to continue the conversation in those threads. Each of those KIPs will have its own discussion and voting process. Thanks again to all who participated! ~Anatolii. On Mon, Mar 2, 2026 at 7:31 AM Chia-Ping Tsai <[email protected]> wrote: > +1 (binding) > > > Satish Duggana <[email protected]> 於 2026年2月28日 上午11:00 寫道: > > > > Thanks for the KIP. > > I've reviewed the updated KIP and agree with the motivation behind > > KIP-1150, overall LGTM. > > It seems KIP-1163 and KIP-1164 require more details, which we can discuss > > in those respective threads. > > > > +1(binding) for KIP-1150. > > > > ~Satish. > > > >> On Fri, 27 Feb 2026 at 23:28, Jun Rao via dev <[email protected]> > wrote: > >> > >> Hi, Anatolii, > >> > >> Thanks for the KIP. The link you posted for KIP-1150 seems incorrect > and it > >> points to KIP-1163. Otherwise, +1. > >> > >> Jun > >> > >>> On Wed, Feb 25, 2026 at 2:59 PM vaquar khan <[email protected]> > wrote: > >>> > >>> Fair point, Chris. I agree with that architectural boundary. KIP-1150 > >>> successfully sets the high-level mandate , and we can rigorously tackle > >> the > >>> exact EOS and RPC mechanics over in the KIP-1164 thread . > >>> > >>> Andrew, I am fully aligned with you on the massive operational value of > >>> eliminating those cross-AZ replication costs. It is absolutely the > right > >>> strategic direction for Kafka. > >>> > >>> Since my initial concerns on the storage side are resolved, and we are > >>> aligned on where the transactional interfaces will be finalized, I am > >>> officially withdrawing my objection. > >>> +1 (non-binding) for KIP-1150. > >>> > >>> I will migrate my open questions over to the KIP-1164 discussion thread > >> so > >>> we can lock down the data safety details there. > >>> > >>> Regards, > >>> Vaquar Khan > >>> > >>> On Wed, 25 Feb 2026 at 15:24, Chris Egerton <[email protected]> > >>> wrote: > >>> > >>>> Hi Vaquar, > >>>> > >>>>> Let me know what you guys think about locking down the text for these > >>>> interfaces. > >>>> > >>>> I think this KIP has the appropriate level of detail and any concerns > >>> about > >>>> EOS can be addressed in the relevant sub-KIP. > >>>> > >>>> Chris > >>>> > >>>> On Wed, Feb 25, 2026 at 4:20 PM vaquar khan <[email protected]> > >>> wrote: > >>>> > >>>>> Hi everyone, > >>>>> > >>>>> First off, thanks to the authors for the Feb 12th updates to > >> KIP-1163 . > >>>>> Adding the periodic reconciliation loop clears up my concerns about > >> the > >>>>> orphaned "Upload-then-Commit" segments, so I'm officially withdrawing > >>> my > >>>>> objection on the storage leak issue . > >>>>> > >>>>> Chris and Greg- since you both mentioned digging into the 1164 > >>> details, I > >>>>> wanted to pick your brains on how Exactly-Once Semantics (EOS) is > >> going > >>>> to > >>>>> safely operate here. In standard Kafka, the Partition Leader is our > >>>> single > >>>>> serialization point. It receives the data, tracks ongoing > >> transactions > >>>> via > >>>>> the ProducerStateManager, and calculates the Last Stable Offset (LSO) > >>>>> locally . Since KIP-1150 removes the leader, the Batch Coordinator > >>> takes > >>>>> over. But as I read through the current text, a few critical > >>>>> synchronization barriers seem to be missing to me: > >>>>> > >>>>> 1. LSO Calculation: How exactly will the Batch Coordinator maintain > >> and > >>>>> calculate the LSO? Justine Olshan brought this up earlier too . Will > >>> the > >>>>> coordinator run its own ProducerStateManager to track ongoing > >>>> transactions, > >>>>> or is there a totally different state machine planned? > >>>>> > >>>>> 2. RPC Protocol: What's the exact synchronization protocol between > >> the > >>>>> legacy Transaction Coordinator and the new Batch Coordinator? When > >> the > >>>> Txn > >>>>> Coordinator sends a commit marker, how does the Batch Coordinator > >>>> actually > >>>>> verify it has received all the prerequisite data batches for that > >>>> specific > >>>>> transaction epoch? > >>>>> > >>>>> 3. Delayed Data Race Condition: Let's say a broker hits a GC pause > >>> right > >>>>> *after > >>>>> *uploading a batch to object storage, but *before* committing the > >>>>> coordinates . If the transaction commit marker arrives at the > >>> Coordinator > >>>>> first, what happens? Does the Coordinator wait? If not, couldn't the > >>>>> transaction commit with missing data, completely violating > >>> read_committed > >>>>> isolation? > >>>>> > >>>>> The KIP vaguely mentions *transactional checks* but leaves the actual > >>>>> commit protocol and public interfaces undefined right now . I'm not > >>>> saying > >>>>> the design itself is broken, but I really think myself and others > >> need > >>> to > >>>>> see these RPC flows explicitly documented before we implement and > >>> adopt > >>>>> this. Otherwise, we risk baking in some severe data isolation > >> headaches > >>>>> down the line. > >>>>> > >>>>> Let me know what you guys think about locking down the text for these > >>>>> interfaces. > >>>>> > >>>>> Regards, > >>>>> Vaquar Khan > >>>>> > >>>>> On Wed, 25 Feb 2026 at 10:33, Greg Harris via dev < > >>> [email protected]> > >>>>> wrote: > >>>>> > >>>>>> Hey all, > >>>>>> > >>>>>> I'm excited to discuss more details in 1163 and 1164 with everyone. > >>>>>> > >>>>>> +1 (binding) > >>>>>> > >>>>>> Thanks! > >>>>>> Greg > >>>>>> > >>>>>> On Wed, Feb 25, 2026 at 1:08 AM Anatolii Popov via dev < > >>>>>> [email protected]> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> Given the importance of this KIP, we want to keep the vote open > >>> for a > >>>>> few > >>>>>>> more days to give time to people who had comments in the DISCUSS > >>>> thread > >>>>>> to > >>>>>>> cast their vote if they want. > >>>>>>> > >>>>>>> On Wed, Feb 25, 2026 at 10:47 AM Josep Prat via dev < > >>>>>> [email protected]> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi all, > >>>>>>>> As a co-author of the KIP, I want to explicitly cast my vote > >> for > >>>> this > >>>>>>> KIP. > >>>>>>>> > >>>>>>>> +1 (binding) > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Feb 25, 2026 at 9:02 AM Luke Chen <[email protected]> > >>>> wrote: > >>>>>>>> > >>>>>>>>> I've re-read KIP-1150, and still agree this is what we need > >> for > >>>>>> Apache > >>>>>>>>> Kafka. > >>>>>>>>> > >>>>>>>>> +1 (binding) from me. > >>>>>>>>> > >>>>>>>>> Thank you, > >>>>>>>>> Luke > >>>>>>>>> > >>>>>>>>> On Wed, Feb 25, 2026 at 12:10 PM Chris Egerton < > >>>>>>> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi all, > >>>>>>>>>> > >>>>>>>>>> Thanks for the KIP. I've reviewed 1150, 1163, and 1164, as > >>> well > >>>> as > >>>>>> the > >>>>>>>>>> relevant discussion threads. I may have granular comments > >>> about > >>>>> 1163 > >>>>>>> and > >>>>>>>>>> 1164 but the overall approach suggested in 1150 looks good > >> to > >>>> me. > >>>>> I > >>>>>>>>>> especially like that the approach covers two main pain > >> points > >>> of > >>>>>>>> operating > >>>>>>>>>> and paying for Kafka today: it allows cross-AZ traffic to be > >>>>> reduced > >>>>>>>> (even > >>>>>>>>>> eliminated in some cases), and it also allows local disk > >> usage > >>>> by > >>>>>>>> brokers > >>>>>>>>>> to be reduced (if operators opt for a small local cache on > >>>>> follower > >>>>>>>>>> brokers > >>>>>>>>>> for non-tiered segments). > >>>>>>>>>> > >>>>>>>>>> +1 (binding) > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> > >>>>>>>>>> Chris > >>>>>>>>>> > >>>>>>>>>> On Mon, Jan 26, 2026 at 3:36 PM vaquar khan < > >>>>> [email protected]> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi Josep, > >>>>>>>>>>> > >>>>>>>>>>> Thank you for the detailed response. I appreciate the > >>>>>> clarification > >>>>>>>>>>> regarding the distinction between the Inkless POC and the > >>> KIP > >>>>>>> design. > >>>>>>>>>>> > >>>>>>>>>>> However, my objection is not based on temporary bugs in > >> the > >>>>> fork, > >>>>>>> but > >>>>>>>>>> *on > >>>>>>>>>>> architectural gaps in the KIPs themselves* that these > >>>>>> implementation > >>>>>>>>>> issues > >>>>>>>>>>> highlighted. If we are voting to approve the design, the > >>>> design > >>>>>>>>>> documents > >>>>>>>>>>> must be structurally complete regarding data safety. > >>>>>>>>>>> > >>>>>>>>>>> *1. Regarding Storage Leaks (The Missing Design)* You > >>>> mentioned > >>>>>> that > >>>>>>>>>>> cleanup logic "can be defined later." However, KIP-1163 > >>>>> explicitly > >>>>>>>>>>> delegates this responsibility to a separate process, and > >>>>> KIP-1165 > >>>>>>>>>> (Object > >>>>>>>>>>> Compaction/GC) is currently marked as "Discarded" in the > >>> wiki. > >>>>>>>>>>> > >>>>>>>>>>> We cannot vote to approve a storage engine that has no > >>>> specified > >>>>>>>>>> mechanism > >>>>>>>>>>> for garbage collection. The "Upload-then-Commit" pattern > >>>>> described > >>>>>>> in > >>>>>>>>>>> KIP-1163 structurally creates orphaned segments during > >>> broker > >>>>>>>> failures. > >>>>>>>>>>> Without an active KIP defining the reconciliation protocol > >>>>> (since > >>>>>>>>>> KIP-1165 > >>>>>>>>>>> was withdrawn), the proposal effectively describes a > >> system > >>>> with > >>>>>>>>>> unbounded > >>>>>>>>>>> storage growth during failure modes. This is a blocking > >>> design > >>>>>> gap, > >>>>>>>> not > >>>>>>>>>> an > >>>>>>>>>>> implementation detail. > >>>>>>>>>>> > >>>>>>>>>>> *2. Regarding EOS (The Coordinator Synchronization Gap)* > >>> This > >>>> is > >>>>>>> not a > >>>>>>>>>>> misunderstanding of standard Kafka transactions; it is a > >>>>> critique > >>>>>> of > >>>>>>>> how > >>>>>>>>>>> KIP-1150 changes them. Standard EOS relies on the > >> Partition > >>>>> Leader > >>>>>>> to > >>>>>>>>>>> sequence markers and calculate the LSO (Last Stable > >> Offset) > >>> in > >>>>>>> memory. > >>>>>>>>>>> KIP-1150 removes the Leader. > >>>>>>>>>>> > >>>>>>>>>>> KIP-1164 (Batch Coordinator) must explicitly define the > >> RPC > >>>> flow > >>>>>>>> between > >>>>>>>>>>> the Transaction Coordinator and the Batch Coordinator to > >>>> replace > >>>>>> the > >>>>>>>>>>> leader's role. Currently, the KIP does not specify how the > >>>>> system > >>>>>>>>>> prevents > >>>>>>>>>>> a "Split Brain" scenario where a consumer reads ahead of a > >>>>>>> transaction > >>>>>>>>>>> marker that hasn't yet been sequenced by the Batch > >>>> Coordinator. > >>>>>> This > >>>>>>>> is > >>>>>>>>>> a > >>>>>>>>>>> protocol-level correctness issue that must be resolved in > >>> the > >>>>> text > >>>>>>>>>> before > >>>>>>>>>>> adoption. > >>>>>>>>>>> > >>>>>>>>>>> Please note - I am maintaining my objection based on > >> missing > >>>>>>>>>>> specifications, not code bugs. > >>>>>>>>>>> > >>>>>>>>>>> I respectfully request that we pause the vote until: > >>>>>>>>>>> > >>>>>>>>>>> A valid design for Garbage Collection (replacing the > >>>>> discarded > >>>>>>>>>>> KIP-1165) is added to the proposal. > >>>>>>>>>>> > >>>>>>>>>>> The Transaction/LSO synchronization protocol is > >>> explicitly > >>>>>>>>>> documented > >>>>>>>>>>> in KIP-1164. > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> > >>>>>>>>>>> Vaquar Khan > >>>>>>>>>>> Sr Data Architect > >>>>>>>>>>> https://www.linkedin.com/in/vaquar-khan-b695577/ > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> [image: Aiven] <https://www.aiven.io> > >>>>>>>> > >>>>>>>> *Josep Prat* > >>>>>>>> Sr. Engineering Director, Streaming Services, *Aiven* > >>>>>>>> [email protected] | +491715557497 > >>>>>>>> aiven.io <https://www.aiven.io> | < > >>>>>>> https://www.facebook.com/aivencloud > >>>>>>>>> > >>>>>>>> <https://www.linkedin.com/company/aiven/> < > >>>>>>>> https://twitter.com/aiven_io> > >>>>>>>> *Aiven Deutschland GmbH* > >>>>>>>> Alexanderufer 3-7, 10117 Berlin > >>>>>>>> > >>>>>>>> Geschäftsführer: Oskari Saarenmaa, Kenneth Chen > >>>>>>>> Amtsgericht Charlottenburg, HRB 209739 B > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Anatolii Popov > >>>>>>> Senior Software Developer, *Aiven OY* > >>>>>>> m: +358505126242 > >>>>>>> w: aiven.io e: [email protected] > >>>>>>> <https://www.facebook.com/aivencloud> > >>>>>>> <https://www.linkedin.com/company/aiven/> < > >>>>>> https://twitter.com/aiven_io> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > -- Anatolii Popov Senior Software Developer, *Aiven OY* m: +358505126242 w: aiven.io e: [email protected] <https://www.facebook.com/aivencloud> <https://www.linkedin.com/company/aiven/> <https://twitter.com/aiven_io>
