Hi Yuxia! Thank you for the question. We've just opened the discussion thread for the KIP-1164 [1]. If you don't mind, could you please repost your question there? This would help a lot to keep the branchy discussion manageable.
Best, Ivan [1] https://lists.apache.org/thread/m9l6lbqv2cffxtz5frypylmqjd7bsqoz On Wed, Apr 23, 2025, at 09:39, yuxia wrote: > Hi! > > Thanks for the greate work and I'm excited to see it happens. These KIPs > looks well to me. > I have a question about the Batch Coordinator in KIP-1164. > Seems Batch Coordinator is very important in the diskless implementation, > could you explain more details on the implementation? > For me, I'm wondering how it "chooses the total ordering for writes" and > what's the "information necessary to support idempotent producers". > I'm thinking about the following cases: > 1: client is going to send message A, B, C to Kafka > 2: client sending A, B to broker1, broker1 recieve A, B > 3: broker1 down, client send C to broker2 > 4: since broker1 is down, then client recieve A,B fail and retry to send A,B > to broker2 > Then, how Batch Coordinator can choose totol order to be A,B,C ? > > > Best regards, > Yuxia > > ----- 原始邮件 ----- > 发件人: "Christo Lolov" <christolo...@gmail.com> > 收件人: dev@kafka.apache.org > 发送时间: 星期二, 2025年 4 月 22日 下午 9:04:06 > 主题: [SPAM]Re: [DISCUSS] KIP-1150 Diskless Topics > > Hello! > > I want to start with saying that this is a big and impressive undertaking > and I am really excited to see its progression! I am posting my initial > comments in this thread, but they span a few of the child KIPs. Let me know > which questions you would like to move elsewhere. I understand that you > want first a consensus on the direction, but I think I still need designs > on a few of the core areas to form an opinion. > > CL - 1: In the same lane as Luke's comment, it would be very useful to see > explicitly what will stay on disk and what won't stay on disk > > CL - 2: It would also be very useful to explicitly say what the > interactions will be with the Kraft-related topic - would it be diskless or > on disk? > > CL - 3: Do you envision that this feature will work with KIP-932? > > CL - 4: KIP-1163 says that there won't be a production-grade implementation > of the Batch Coordinator and KIP-1164 says the opposite. Which one would it > be? > > CL - 5: KIP-1163 says that the Batch Coordinator doesn't need to concern > itself with object storage and KIP-1164 says that it will manage the object > physical deletion. Which one would it be? > > CL - 6: Could you go in a bit more details on whether we would need changes > to the Kafka clients to achieve what you are proposing? If no changes are > necessary to the clients then what changes would be necessary to brokers to > make clients believe they are communicating with the "right" brokers? Would > those make it in KIP-1163? > > CL - 7: Where and how would indexes (offset, time, producer snapshot) live? > In particular, I am interested in how the reference Batch Coordinator will > quickly (for a certain definition of quickly) rebuild state? > > CL - 8: I think that we try to have as few Kafka dependencies as possible. > The closure of compile + runtime broker-only dependencies is currently 16 > (if I have done my analysis correctly). What problem(s) do you envision > w.r.t. spilling to disk which we wouldn't be able to solve with our own > implementation that require SQLite? > > Once again, great work so far! > > Best, > Christo > > On Sun, 20 Apr 2025 at 23:04, Stanislav Kozlovski < > stanislavkozlov...@apache.org> wrote: > > > This is an amazing initiative. Huge kudos for driving it. We should > > incorporate it one way or another. > > > > I have a suggestion I'd like to hear your thoughts on. I'm cognizant of > > the effort required for KIP-1150 so I don't necessarily want to increase > > the scope - but thinking about this early on can help design later on, plus > > shape the motivation. > > > > The idea is to introduce support for replicationless acks=1 writes. This > > would be very similar to how AutoMQ's WAL+S3 feature works, as far as I > > understand it. > > > > Could we have Diskless Brokers serve acks=1 produce requests by > > immediately persisting the data on disk (not sure if we should use fsync or > > not), responding to the request, and then still asynchronously batching > > said data with regular acks=all data via the " > > diskless.append.commit.interval.ms"/ "diskless.append.buffer.max.bytes" > > configs? > > > > If I'm not mistaken, this would offer very similar guarantees as today's > > acks=1 requests, where a period of low durability exists b/w the time the > > leader persists to its local disk and the time all followers persist to > > their disk. Granted, in traditional Kafka this period is probably no more > > than a hundred milliseconds, and here it'd be at least 2x higher. But I > > believe that given the major savings, many acks=1 users will be happy to > > make the tradeoff. > > > > While on the topic of cost, I hastily ran some cost calculations and found > > that the KIP should reduce replication costs by more than 80x. ( > > https://topicpartition.io/blog/kip-1150-diskless-topics-in-apache-kafka). > > There may be some errors there as the batch coordinator RPC and merging > > isn't fully fleshed out - but I believe it's directionally correct. It may > > be worth to add that to the motivation in one way or another - so as to be > > able to quantify the numbers. > > > > Best, > > Stanislav > > > > On 2025/04/19 11:02:30 Ivan Yurchenko wrote: > > > Hi Ziming, > > > > > > > 1. Is this feature available by just a minor adjust of config or it > > will intrude current code heavily, say, AutoMq is 100% compatible with > > Kafka and doesn’t intrude the code heavily > > > > > > If we speak about the part visible to the user, we expect: > > > 1. Minimal changes to the client code (with potential fallback with > > even 0 changes for older clients). > > > 2. A limited set of new configurations for broker and topics. > > > Otherwise, this should be a perfectly normal Apache Kafka. > > > > > > > 2. Though we are not discussing implement details, it’s worth giving > > some high-level architecture ideas, and it’s better to compare with AutoMq > > like systems. > > > > > > There's quite a bit of high-level architecture in a sub-KIP-1163 [1]. > > > We didn't do comparison to AutoMQ (to the best of our knowledge, they > > have a fairly different approach), but if this helps the community to get > > the idea then sure, we should do this. > > > > > > > 3. What we will provide through it, I think we will just provide a > > common interface and put implementations in another repos, just as we did > > for Kafka Connect and Kafka Tired Storage. > > > > > > This is true for the component that does CRUD operations on object > > storage. However, for the batch coordinator we would like to provide a > > decent out-of-the-box self-contained (i.e. no external deps like database) > > implementation that many Kafka users who don't have challenging scaling > > requirements would benefit from. There's the sub-KIP-1164 [2] for this. > > > > > > > 4. How to deal with KRaft related protocol, since metadata topic is > > managed differently with __cluster_metadata, through this KIP, will we > > align the gap between __cluster_metadata and data topics by put metadata > > in an object storage? if so, there will be no standby controller? since > > standby controller is the __cluster_metadata followers and there will be no > > followers. > > > > > > The current plan is to not directly work with the KRaft and > > __cluster_metadata. What we need from KRaft is 3 types of events: > > topic/partition creation, topic deletion, and topic configuration changes > > (with the possibility to limit this set to topic deletion only). We think > > that'd be enough if we have a "bridge" that watches for these events in > > __cluster_metadata and reflects them in the batch coordinator (basically, > > by sending requests). > > > Does this answer the question or maybe I misunderstood? > > > > > > Best, > > > Ivan > > > > > > [1] > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163%3A+Diskless+Core > > > [2] > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164%3A+Topic+Based+Batch+Coordinator > > > > > > On Fri, Apr 18, 2025, at 12:42, Ziming Deng wrote: > > > > Hi Josep, > > > > > > > > This would be a fascinating feature, some well known Kafka users are > > using Kafka in a cloud-native env. As for as I know, there are already some > > secondary development version Kafka which provide this feature, for > > example, I am using AutoMq(https://github.com/AutoMQ/automq) in my > > environment, which significantly helped ms reduced the cost, so I think > > it’s worthwhile to clarify some related details: > > > > 1. Is this feature available by just a minor adjust of config or it > > will intrude current code heavily, say, AutoMq is 100% compatible with > > Kafka and doesn’t intrude the code heavily > > > > 2. Though we are not discussing implement details, it’s worth giving > > some high-level architecture ideas, and it’s better to compare with AutoMq > > like systems. > > > > 3. What we will provide through it, I think we will just provide a > > common interface and put implementations in another repos, just as we did > > for Kafka Connect and Kafka Tired Storage. > > > > 4. How to deal with KRaft related protocol, since metadata topic is > > managed differently with __cluster_metadata, through this KIP, will we > > align the gap between __cluster_metadata and data topics by put metadata > > in an object storage? if so, there will be no standby controller? since > > standby controller is the __cluster_metadata followers and there will be no > > followers. > > > > > > > > — > > > > Ziming > > > > > > > > > On Apr 16, 2025, at 19:58, Josep Prat <josep.p...@aiven.io.INVALID> > > wrote: > > > > > > > > > > Hi Kafka Devs! > > > > > > > > > > We want to start a new KIP discussion about introducing a new type of > > > > > topics that would make use of Object Storage as the primary source of > > > > > storage. However, as this KIP is big we decided to split it into > > multiple > > > > > related KIPs. > > > > > We have the motivational KIP-1150 ( > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics > > ) > > > > > that aims to discuss if Apache Kafka should aim to have this type of > > > > > feature at all. This KIP doesn't go onto details on how to implement > > it. > > > > > This follows the same approach used when we discussed KRaft. > > > > > > > > > > But as we know that it is sometimes really hard to discuss on that > > meta > > > > > level, we also created several sub-kips (linked in KIP-1150) that > > offer an > > > > > implementation of this feature. > > > > > > > > > > We kindly ask you to use the proper DISCUSS threads for each type of > > > > > concern and keep this one to discuss whether Apache Kafka wants to > > have > > > > > this feature or not. > > > > > > > > > > Thanks in advance on behalf of all the authors of this KIP. > > > > > > > > > > ------------------ > > > > > Josep Prat > > > > > Open Source Engineering Director, Aiven > > > > > josep.p...@aiven.io | +491715557497 | aiven.io > > > > > Aiven Deutschland GmbH > > > > > Alexanderufer 3-7, 10117 Berlin > > > > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > > > > > Anna Richardson, Kenneth Chen > > > > > Amtsgericht Charlottenburg, HRB 209739 B > > > > > > > > > > > > > >