Re: [SPAM][DISCUSS] KIP-1164: Topic Based Batch Coordinator

yuxia Mon, 28 Apr 2025 23:00:12 -0700

Thanks Giuseppe for the explanation! It make sense to me.

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Giuseppe Lillo" <[email protected]>
收件人: "dev" <[email protected]>
发送时间: 星期二, 2025年 4 月 29日 上午 12:14:14
主题: Re: [SPAM][DISCUSS] KIP-1164: Topic Based Batch Coordinator

Hello Yuxia, thanks for your question and interest!

When producing, the broker will call the relevant Batch Coordinator with a
CommitBatches request.
The Batch Coordinator will then write the metadata about these batches into
the __diskless-metadata topic and update its internal state persisted on
SQLite. It will then reply with the assigned offsets.
Read-only Batch Coordinators will also replicate those metadata into their
own internal state.

When consuming, the broker will call the relevant Batch Coordinator with a
FindBatches request.
The Batch Coordinator will search the requested offsets within its internal
state and reply with the batch coordinates (object key, offset within the
object).

In your example, I suppose that A, B and C are all messages written to the
same topic-partition.
The problem you described is solved by the idempotent producer. In order to
support idempotent producer in Diskless topics, information about producer
ID and sequence numbers must be communicated to the Batch Coordinator when
committing a new batch. We included information about the producer
(producer id and producer epoch) and the sequence numbers (base sequence,
last sequence) both in the commitFile public interface and in the
CommitBatches API. When serving a CommitBatches request that includes
idempotent producer information, the Batch Coordinator will also perform
some checks to understand if the produce request is a duplicate or if it
contains out-of-order messages by checking with the internal state.

Best regards,
Giuseppe

On Thu, Apr 24, 2025 at 4:24 AM yuxia <[email protected]> wrote:

> Hi!
>
> Thanks for the greate work and I'm excited to see it happens. This KIP
> looks well to me.
> Seems Batch Coordinator is very important in the diskless implementation,
> could you explain more details on the implementation? I think it'll be much
> better to show what Batch Coordinator will do when write/read or other
> request comes.
>
> I'm also wondering how it "chooses the total ordering for writes" and
> what's the "information necessary to support idempotent producers".
> I'm thinking about the following cases:
> 1: client is going to send message A, B, C to Kafka
> 2: client sending A, B to broker1, broker1 recieve A, B
> 3: broker1 down, client send C to broker2
> 4: since broker1 is down, then client recieve A,B fail and retry to send
> A,B to broker2
> Then, how Batch Coordinator can choose total order to be A,B,C ?
>
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Ivan Yurchenko" <[email protected]>
> 收件人: "dev" <[email protected]>
> 发送时间: 星期三, 2025年 4 月 23日 下午 5:46:46
> 主题: [SPAM][DISCUSS] KIP-1164: Topic Based Batch Coordinator
>
> Hi all!
>
> We want to start the discussion thread for KIP-1164: Topic Based Batch
> Coordinator [1], which is a sub-KIP for KIP-1150 [2].
>
> Let's use the main KIP-1150 discuss thread [3] for high-level questions,
> motivation, and general direction of the feature and this thread for
> discussing the batch coordinator interface and the proposed topic-based
> implementation.
>
> Best,
> Ivan
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164%3A+Topic+Based+Batch+Coordinator
> [2]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics
> [3] https://lists.apache.org/thread/ljxc495nf39myp28pmf77sm2xydwjm6d
>

Re: [SPAM][DISCUSS] KIP-1164: Topic Based Batch Coordinator

Reply via email to