I've re-read KIP-1150, and still agree this is what we need for Apache Kafka.
+1 (binding) from me. Thank you, Luke On Wed, Feb 25, 2026 at 12:10 PM Chris Egerton <[email protected]> wrote: > Hi all, > > Thanks for the KIP. I've reviewed 1150, 1163, and 1164, as well as the > relevant discussion threads. I may have granular comments about 1163 and > 1164 but the overall approach suggested in 1150 looks good to me. I > especially like that the approach covers two main pain points of operating > and paying for Kafka today: it allows cross-AZ traffic to be reduced (even > eliminated in some cases), and it also allows local disk usage by brokers > to be reduced (if operators opt for a small local cache on follower brokers > for non-tiered segments). > > +1 (binding) > > Cheers, > > Chris > > On Mon, Jan 26, 2026 at 3:36 PM vaquar khan <[email protected]> wrote: > > > Hi Josep, > > > > Thank you for the detailed response. I appreciate the clarification > > regarding the distinction between the Inkless POC and the KIP design. > > > > However, my objection is not based on temporary bugs in the fork, but *on > > architectural gaps in the KIPs themselves* that these implementation > issues > > highlighted. If we are voting to approve the design, the design documents > > must be structurally complete regarding data safety. > > > > *1. Regarding Storage Leaks (The Missing Design)* You mentioned that > > cleanup logic "can be defined later." However, KIP-1163 explicitly > > delegates this responsibility to a separate process, and KIP-1165 (Object > > Compaction/GC) is currently marked as "Discarded" in the wiki. > > > > We cannot vote to approve a storage engine that has no specified > mechanism > > for garbage collection. The "Upload-then-Commit" pattern described in > > KIP-1163 structurally creates orphaned segments during broker failures. > > Without an active KIP defining the reconciliation protocol (since > KIP-1165 > > was withdrawn), the proposal effectively describes a system with > unbounded > > storage growth during failure modes. This is a blocking design gap, not > an > > implementation detail. > > > > *2. Regarding EOS (The Coordinator Synchronization Gap)* This is not a > > misunderstanding of standard Kafka transactions; it is a critique of how > > KIP-1150 changes them. Standard EOS relies on the Partition Leader to > > sequence markers and calculate the LSO (Last Stable Offset) in memory. > > KIP-1150 removes the Leader. > > > > KIP-1164 (Batch Coordinator) must explicitly define the RPC flow between > > the Transaction Coordinator and the Batch Coordinator to replace the > > leader's role. Currently, the KIP does not specify how the system > prevents > > a "Split Brain" scenario where a consumer reads ahead of a transaction > > marker that hasn't yet been sequenced by the Batch Coordinator. This is a > > protocol-level correctness issue that must be resolved in the text before > > adoption. > > > > Please note - I am maintaining my objection based on missing > > specifications, not code bugs. > > > > I respectfully request that we pause the vote until: > > > > A valid design for Garbage Collection (replacing the discarded > > KIP-1165) is added to the proposal. > > > > The Transaction/LSO synchronization protocol is explicitly documented > > in KIP-1164. > > > > Regards, > > > > Vaquar Khan > > Sr Data Architect > > https://www.linkedin.com/in/vaquar-khan-b695577/ > > >
