Thanks for your reply! I may not use "normalization". What I want to refer to is:
appendInfo.setLastOffset(offset.value - 1) which underneath updates the base offset field (in record batch) but not the offset delta of each record. Best, tison. Justine Olshan <jols...@confluent.io.invalid> 于2023年8月8日周二 00:43写道: > The sequence summary looks right to me. > For log normalization, are you referring to compaction? The segment's first > and last offsets might change, but a batch keeps its offsets when > compaction occurs. > > Hope that helps. > Justine > > On Mon, Aug 7, 2023 at 8:59 AM Matthias J. Sax <mj...@apache.org> wrote: > > > > but the base offset may change during log normalizing. > > > > Not sure what you mean by "normalization" but offsets are immutable, so > > they don't change. (To be fair, I am not an expert on brokers, so not > > sure how this work in detail when log compaction ticks in). > > > > > This field is given by the producer and the broker should only read it. > > > > Sounds right. The point being is, that the broker has an "expected" > > value for it, and if the provided value does not match the expected one, > > the write is rejected to begin with. > > > > > > -Matthias > > > > On 8/7/23 6:35 AM, tison wrote: > > > Hi Matthias and Justine, > > > > > > Thanks for your reply! > > > > > > I can summarize the answer as - > > > > > > Record offset = base offset + offset delta. This field is calculated by > > the > > > broker and the delta won't change but the base offset may change during > > log > > > normalizing. > > > Record sequence = base sequence + (offset) delta. This field is given > by > > > the producer and the broker should only read it. > > > > > > Is it correct? > > > > > > I implement the manipulation part of base offset following this > > > understanding at [1]. > > > > > > Best, > > > tison. > > > > > > [1] > > > > > > https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/simplesrv/src/lib.rs#L391-L394 > > > > > > > > > Justine Olshan <jols...@confluent.io.invalid> 于2023年8月2日周三 04:19写道: > > > > > >> For what it's worth -- the sequence number is not calculated > > >> "baseOffset/baseSequence + offset delta" but rather by monotonically > > >> increasing for a given epoch. If the epoch is bumped, we reset back to > > >> zero. > > >> This may mean that the offset and sequence may match, but do not > > strictly > > >> need to be the same. The sequence number will also always come from > the > > >> client and be in the produce records sent to the Kafka broker. > > >> > > >> As for offsets, there is some code in the log layer that maintains the > > log > > >> end offset and assigns offsets to the records. The produce handling on > > the > > >> leader should typically assign the offset. > > >> I believe you can find that code here: > > >> > > >> > > > https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766 > > >> > > >> Justine > > >> > > >> On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax <mj...@apache.org> > > wrote: > > >> > > >>> The _offset_ is the position of the record in the partition. > > >>> > > >>> The _sequence number_ is a unique ID that allows broker to > de-duplicate > > >>> messages. It requires the producer to implement the idempotency > > protocol > > >>> (part of Kafka transactions); thus, sequence numbers are optional and > > as > > >>> long as you don't want to support idempotent writes, you don't need > to > > >>> worry about them. (If you want to dig into details, checkout KIP-98 > > that > > >>> is the original KIP about Kafka TX). > > >>> > > >>> HTH, > > >>> -Matthias > > >>> > > >>> On 8/1/23 2:19 AM, tison wrote: > > >>>> Hi, > > >>>> > > >>>> I'm wringing a Kafka API Rust codec library[1] to understand how > Kafka > > >>>> models its concepts and how the core business logic works. > > >>>> > > >>>> During implementing the codec for Records[2], I saw a twins of > fields > > >>>> "sequence" and "offset". Both of them are calculated by > > >>>> baseOffset/baseSequence + offset delta. Then I'm a bit confused how > to > > >>> deal > > >>>> with them properly - what's the difference between these two > concepts > > >>>> logically? > > >>>> > > >>>> Also, to understand how the core business logic works, I write a > > simple > > >>>> server based on my codec library, and observe that the server may > need > > >> to > > >>>> update offset for records produced. How does Kafka set the correct > > >> offset > > >>>> for each produced records? And how does Kafka maintain the > calculation > > >>> for > > >>>> offset and sequence during these modifications? > > >>>> > > >>>> I'll appreciate if anyone can answer the question or give some > > insights > > >>> :D > > >>>> > > >>>> Best, > > >>>> tison. > > >>>> > > >>>> [1] https://github.com/tisonkun/kafka-api > > >>>> [2] https://kafka.apache.org/documentation/#messageformat > > >>>> > > >>> > > >> > > > > > >