Hi Matthias and Justine, Thanks for your reply!
I can summarize the answer as - Record offset = base offset + offset delta. This field is calculated by the broker and the delta won't change but the base offset may change during log normalizing. Record sequence = base sequence + (offset) delta. This field is given by the producer and the broker should only read it. Is it correct? I implement the manipulation part of base offset following this understanding at [1]. Best, tison. [1] https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/simplesrv/src/lib.rs#L391-L394 Justine Olshan <jols...@confluent.io.invalid> 于2023年8月2日周三 04:19写道: > For what it's worth -- the sequence number is not calculated > "baseOffset/baseSequence + offset delta" but rather by monotonically > increasing for a given epoch. If the epoch is bumped, we reset back to > zero. > This may mean that the offset and sequence may match, but do not strictly > need to be the same. The sequence number will also always come from the > client and be in the produce records sent to the Kafka broker. > > As for offsets, there is some code in the log layer that maintains the log > end offset and assigns offsets to the records. The produce handling on the > leader should typically assign the offset. > I believe you can find that code here: > > https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766 > > Justine > > On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax <mj...@apache.org> wrote: > > > The _offset_ is the position of the record in the partition. > > > > The _sequence number_ is a unique ID that allows broker to de-duplicate > > messages. It requires the producer to implement the idempotency protocol > > (part of Kafka transactions); thus, sequence numbers are optional and as > > long as you don't want to support idempotent writes, you don't need to > > worry about them. (If you want to dig into details, checkout KIP-98 that > > is the original KIP about Kafka TX). > > > > HTH, > > -Matthias > > > > On 8/1/23 2:19 AM, tison wrote: > > > Hi, > > > > > > I'm wringing a Kafka API Rust codec library[1] to understand how Kafka > > > models its concepts and how the core business logic works. > > > > > > During implementing the codec for Records[2], I saw a twins of fields > > > "sequence" and "offset". Both of them are calculated by > > > baseOffset/baseSequence + offset delta. Then I'm a bit confused how to > > deal > > > with them properly - what's the difference between these two concepts > > > logically? > > > > > > Also, to understand how the core business logic works, I write a simple > > > server based on my codec library, and observe that the server may need > to > > > update offset for records produced. How does Kafka set the correct > offset > > > for each produced records? And how does Kafka maintain the calculation > > for > > > offset and sequence during these modifications? > > > > > > I'll appreciate if anyone can answer the question or give some insights > > :D > > > > > > Best, > > > tison. > > > > > > [1] https://github.com/tisonkun/kafka-api > > > [2] https://kafka.apache.org/documentation/#messageformat > > > > > >