Hi Matthias and Justine,

Thanks for your reply!

I can summarize the answer as -

Record offset = base offset + offset delta. This field is calculated by the
broker and the delta won't change but the base offset may change during log
normalizing.
Record sequence = base sequence + (offset) delta. This field is given by
the producer and the broker should only read it.

Is it correct?

I implement the manipulation part of base offset following this
understanding at [1].

Best,
tison.

[1]
https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/simplesrv/src/lib.rs#L391-L394


Justine Olshan <jols...@confluent.io.invalid> 于2023年8月2日周三 04:19写道:

> For what it's worth -- the sequence number is not calculated
> "baseOffset/baseSequence + offset delta" but rather by monotonically
> increasing for a given epoch. If the epoch is bumped, we reset back to
> zero.
> This may mean that the offset and sequence may match, but do not strictly
> need to be the same. The sequence number will also always come from the
> client and be in the produce records sent to the Kafka broker.
>
> As for offsets, there is some code in the log layer that maintains the log
> end offset and assigns offsets to the records. The produce handling on the
> leader should typically assign the offset.
> I believe you can find that code here:
>
> https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766
>
> Justine
>
> On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax <mj...@apache.org> wrote:
>
> > The _offset_ is the position of the record in the partition.
> >
> > The _sequence number_ is a unique ID that allows broker to de-duplicate
> > messages. It requires the producer to implement the idempotency protocol
> > (part of Kafka transactions); thus, sequence numbers are optional and as
> > long as you don't want to support idempotent writes, you don't need to
> > worry about them. (If you want to dig into details, checkout KIP-98 that
> > is the original KIP about Kafka TX).
> >
> > HTH,
> >    -Matthias
> >
> > On 8/1/23 2:19 AM, tison wrote:
> > > Hi,
> > >
> > > I'm wringing a Kafka API Rust codec library[1] to understand how Kafka
> > > models its concepts and how the core business logic works.
> > >
> > > During implementing the codec for Records[2], I saw a twins of fields
> > > "sequence" and "offset". Both of them are calculated by
> > > baseOffset/baseSequence + offset delta. Then I'm a bit confused how to
> > deal
> > > with them properly - what's the difference between these two concepts
> > > logically?
> > >
> > > Also, to understand how the core business logic works, I write a simple
> > > server based on my codec library, and observe that the server may need
> to
> > > update offset for records produced. How does Kafka set the correct
> offset
> > > for each produced records? And how does Kafka maintain the calculation
> > for
> > > offset and sequence during these modifications?
> > >
> > > I'll appreciate if anyone can answer the question or give some insights
> > :D
> > >
> > > Best,
> > > tison.
> > >
> > > [1] https://github.com/tisonkun/kafka-api
> > > [2] https://kafka.apache.org/documentation/#messageformat
> > >
> >
>

Reply via email to