Thanks Jun I agree with the assessment. I've added it in the document: https://cwiki.apache.org/confluence/display/KAFKA/The+Path+Forward+for+Saving+Cross-AZ+Replication+Costs+KIPs
Thank you. Luke On Thu, Aug 7, 2025 at 1:06 AM Jun Rao <j...@confluent.io.invalid> wrote: > Hi, Luke, > > Thanks for starting the discussion. I took a look at all three proposals > and the following is my assessment. > > KIP-1150 (diskless): > Pros: > * has the most benefits to the users. > -- most complete saving of cross zone network cost (enabled by leader less > design) > -- better durability (by leveraging block storage) > -- best scalability (by separating data from the metadata) > * clean architecture (no unnatural intrusive changes to existing code base) > Cons: large effort, but arguable this is what's needed to build a true > cloud native architecture > > KIP-1176 (tier active segment) > Pros: > * limited benefits to the users > --saving of cross zone network cost (limited saving on the producer side) > * small effort > Cons: > * the current availability story is weak > * it's not clear if the effort is still small once details on correctness, > cost, cleanness are figured out > > KIP-1183 (share storage) > Pros: > * moderate benefits to the users > -- saving of cross zone network cost (limited saving on the producer side > and the consumer side) > -- better durability (by leveraging block storage) > -- improved scalability > Cons: > * weaker availability (no hot standby) > * scalability not as good as KIP-1150 > * effort to build the plugin is too large > > Thanks, > > Jun > > On Wed, Aug 6, 2025 at 12:58 AM Luke Chen <show...@gmail.com> wrote: > > > Hi Josep, > > > > Thanks for the update. > > > > > Luke, thank you for being proactive and caring about this topic! > > I believe many community users are also caring about this topic! :) > > > > Look forward to seeing the updated KIP! > > > > > > Hi Stanislav, > > > > Yes, it'd be good for the community to decide which way we want to go, > > Leaderless or leader-based is absolutely one of the decisions. > > And yes, more than one KIP is also good to me. It's just that we need a > way > > to move them forward. > > Otherwise, suppose one of the KIPs is ready for voting, we can anticipate > > requests to wait for the other two related KIPs. > > Any good suggestions? > > > > Hi Xinyu, > > > > Thanks for the reply. > > Look forward to seeing the updated KIP! > > > > > If the community plans to adopt a leaderless architecture, will the > focus > > be on a complete transition to leaderless, or will both architectures > > coexist in the long term? > > > > I don't think we will abandon the leader-based design as a lot of users > are > > still relying on it. > > Besides, KIP-1150 also claims the existing leader-based protocol works as > > usual. > > So, I think they should coexist in the long term. > > > > > > Thank you. > > Luke > > > > > > On Wed, Aug 6, 2025 at 10:13 AM Xinyu Zhou <yu...@apache.org> wrote: > > > > > Hi Luke, > > > > > > Thank you for creating this dedicated thread; we definitely need a > space > > to > > > discuss future steps for these topics. I apologize for my delay on > > KIP-1183 > > > and will provide more details in the coming weeks. > > > > > > I agree with Stanislav that we should first focus on the community's > > > direction. Specifically, should we consider introducing a leaderless > > > architecture to Kafka, given that it currently relies on a partitioned, > > > leader-based model? > > > > > > From my own perspective, I’m particularly interested in how Leaderless > > and > > > Leader-based architectures differ when it comes to handling data > > > locality—which directly affects batching and fetch efficiency—and in > the > > > way core features are implemented. For instance, ordering, compaction, > > > transactions, idempotent producers, and queues all have to be realized > on > > > the Coordinator in a Leaderless design, whereas in a Leader-based > design > > > they are handled by the Leader Partition. > > > > > > If the community plans to adopt a leaderless architecture, will the > focus > > > be on a complete transition to leaderless, or will both architectures > > > coexist in the long term? > > > > > > I welcome discussions on this topic and am eager to hear diverse > > opinions. > > > > > > Regards, > > > Xinyu > > > > > > On Wed, Aug 6, 2025 at 3:05 AM Stanislav Kozlovski < > > > stanislavkozlov...@apache.org> wrote: > > > > > > > Thank you Luke for this wonderful summary and taking initiative. > > > > > > > > To me, it seems like a large differentiator from KIP-1150 and others > is > > > > the leaderless design. The other two don’t allow for it. > > > > > > > > It sounds productive to focus the discussion on whether the > leaderless > > > > design is worth it on top of the replication cost savings. > > > > > > > > I’m of the opinion that it’s worth pursuing - both for the truly zero > > > > network cost (no producer cross az) but perhaps even more importantly > > the > > > > zero state architecture that promises to significantly simplify > > > operations, > > > > including auto scaling brokers and scaling throughput per partition > > > > > > > > It would be great if the folks at Aiven could address the concerns > > > > regarding queue and transactions support. I’m not of the opinion that > > > these > > > > things need to ship with v1, but it would be wise to ensure nothing > in > > > the > > > > architecture blocks these features from being shipped in the future > > > > > > > > KIP-1176 is also very cool, addressing the acks=1 case will still be > > > > necessary. I think it’s a necessary feature to implement, but I’d be > > > > disappointed if that’s the only diskless solution the community > agrees > > > on. > > > > > > > > A good path, if possible, may be to merge KIP-1150 and KIP-1176. > > > > > > > > If instead the community decides leaderless isn’t necessary, then > > > KIP-1183 > > > > seems fit. > > > > > > > > That’s my opinion. Happy to hear if anyone disagrees. > > > > > > > > On 2025/08/05 14:30:45 Josep Prat wrote: > > > > > Hi Luke and community! > > > > > > > > > > Luke, thank you for being proactive and caring about this topic! > > > > > > > > > > In the meantime we have been keeping ourselves busy pushing our > > > > > implementation of KIP-1150 to production to validate our > assumptions > > > and > > > > > confirm its strengths while discovering its weaknesses. > > > > > Now, after gathering some experience running it, we are (as I'm > > writing > > > > > this, gathered in the same room) working on an improved proposal > for > > > > > KIP-1150 that also addresses the concerns from the community. > > > > > We expect to share the updated KIP in the next couple of weeks. > > > > > > > > > > We apologize for the recent period of silence and are committed to > > more > > > > > regular communication as we move forward. > > > > > > > > > > Best, > > > > > > > > > > > > > > > On Tue, Aug 5, 2025 at 10:31 AM Luke Chen <show...@gmail.com> > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > The Kafka community is currently seeing an unprecedented > situation > > > with > > > > > > three KIPs (KIP-1150, IP-1176, KIP-1183) simultaneously > addressing > > > the > > > > same > > > > > > challenge of high replication costs when running Kafka across > > > multiple > > > > > > cloud availability zones. Each KIP offers a different solution to > > > this > > > > > > issue. While diversity of innovative ideas is a key strength of > > > > open-source > > > > > > projects, it creates a burden for reviewers and users who must > > > compare > > > > and > > > > > > comment on multiple proposals simultaneously. Furthermore, > > discussion > > > > > > around the three KIPs has stalled for over two months now. This > > could > > > > be > > > > > > due to the authors being hesitant to proceed due to the existence > > of > > > > > > alternative, potentially conflicting, solutions. Addressing > > > replication > > > > > > cost is a key concern of Kafka’s userbase and we should try to > move > > > the > > > > > > conversation forward if we can. > > > > > > > > > > > > From what I understand, these three KIPs are not mutually > > exclusive. > > > > But > > > > > > adopting all three KIPs in the community might not be what we > > expect. > > > > Thus, > > > > > > I would like to *start a discussion on how we could move the > > > > conversation > > > > > > forward*. > > > > > > > > > > > > To save time for the KIP readers/reviewers, I have created this > > > > document > > > > > > < > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/The+Path+Forward+for+Saving+Cross-AZ+Replication+Costs+KIPs > > > > > > >[1] > > > > > > to help summarize each of the KIPs and describe their current > > status. > > > > *Hope > > > > > > to get some suggestions/feedback from the community*. > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/The+Path+Forward+for+Saving+Cross-AZ+Replication+Costs+KIPs > > > > > > > > > > > > KIP-1150: > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics > > > > > > KIP-1176 > > > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+TopicsKIP-1176 > > > > > > > > > > > : > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1176%3A+Tiered+Storage+for+Active+Log+Segment > > > > > > KIP-1183 > > > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1176%3A+Tiered+Storage+for+Active+Log+SegmentKIP-1183 > > > > > > > > > > > : > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1183%3A+Unified+Shared+Storage > > > > > > > > > > > > > > > > > > Thank you. > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > -- > > > > > [image: Aiven] <https://www.aiven.io> > > > > > > > > > > *Josep Prat* > > > > > Sr. Engineering Director, Streaming Services, *Aiven* > > > > > josep.p...@aiven.io | +491715557497 > > > > > aiven.io <https://www.aiven.io> | < > > > > https://www.facebook.com/aivencloud> > > > > > <https://www.linkedin.com/company/aiven/> < > > > > https://twitter.com/aiven_io> > > > > > *Aiven Deutschland GmbH* > > > > > Alexanderufer 3-7, 10117 Berlin > > > > > > > > > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > > > > > > > > > > Kenneth Chen > > > > > Amtsgericht Charlottenburg, HRB 209739 B > > > > > > > > > > > > > > >