Thanks Giuseppe for the explanation! It make sense to me. Best regards, Yuxia
----- 原始邮件 ----- 发件人: "Giuseppe Lillo" <giuseppe.li...@aiven.io.INVALID> 收件人: "dev" <dev@kafka.apache.org> 发送时间: 星期二, 2025年 4 月 29日 上午 12:14:14 主题: Re: [SPAM][DISCUSS] KIP-1164: Topic Based Batch Coordinator Hello Yuxia, thanks for your question and interest! When producing, the broker will call the relevant Batch Coordinator with a CommitBatches request. The Batch Coordinator will then write the metadata about these batches into the __diskless-metadata topic and update its internal state persisted on SQLite. It will then reply with the assigned offsets. Read-only Batch Coordinators will also replicate those metadata into their own internal state. When consuming, the broker will call the relevant Batch Coordinator with a FindBatches request. The Batch Coordinator will search the requested offsets within its internal state and reply with the batch coordinates (object key, offset within the object). In your example, I suppose that A, B and C are all messages written to the same topic-partition. The problem you described is solved by the idempotent producer. In order to support idempotent producer in Diskless topics, information about producer ID and sequence numbers must be communicated to the Batch Coordinator when committing a new batch. We included information about the producer (producer id and producer epoch) and the sequence numbers (base sequence, last sequence) both in the commitFile public interface and in the CommitBatches API. When serving a CommitBatches request that includes idempotent producer information, the Batch Coordinator will also perform some checks to understand if the produce request is a duplicate or if it contains out-of-order messages by checking with the internal state. Best regards, Giuseppe On Thu, Apr 24, 2025 at 4:24 AM yuxia <luoyu...@alumni.sjtu.edu.cn> wrote: > Hi! > > Thanks for the greate work and I'm excited to see it happens. This KIP > looks well to me. > Seems Batch Coordinator is very important in the diskless implementation, > could you explain more details on the implementation? I think it'll be much > better to show what Batch Coordinator will do when write/read or other > request comes. > > I'm also wondering how it "chooses the total ordering for writes" and > what's the "information necessary to support idempotent producers". > I'm thinking about the following cases: > 1: client is going to send message A, B, C to Kafka > 2: client sending A, B to broker1, broker1 recieve A, B > 3: broker1 down, client send C to broker2 > 4: since broker1 is down, then client recieve A,B fail and retry to send > A,B to broker2 > Then, how Batch Coordinator can choose total order to be A,B,C ? > > > Best regards, > Yuxia > > ----- 原始邮件 ----- > 发件人: "Ivan Yurchenko" <i...@ivanyu.me> > 收件人: "dev" <dev@kafka.apache.org> > 发送时间: 星期三, 2025年 4 月 23日 下午 5:46:46 > 主题: [SPAM][DISCUSS] KIP-1164: Topic Based Batch Coordinator > > Hi all! > > We want to start the discussion thread for KIP-1164: Topic Based Batch > Coordinator [1], which is a sub-KIP for KIP-1150 [2]. > > Let's use the main KIP-1150 discuss thread [3] for high-level questions, > motivation, and general direction of the feature and this thread for > discussing the batch coordinator interface and the proposed topic-based > implementation. > > Best, > Ivan > > [1] > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164%3A+Topic+Based+Batch+Coordinator > [2] > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics > [3] https://lists.apache.org/thread/ljxc495nf39myp28pmf77sm2xydwjm6d >