Re: [DISCUSS] KIP-1226: Introducing Share Partition Lag Persistence and Retrieval

CHIRAG WADHWA Fri, 17 Oct 2025 00:42:25 -0700

Hi Apoorv,
Thanks for the suggestions. Please find my responses below:

AM1: That was a small oversight on my part. Yes, for regular consumers, lag
is always calculated using the read_uncommitted isolation level. I’ve
updated the KIP to specify that the share partition lag calculation will
also rely on the Log End Offset (LEO)—obtained using the read_uncommitted
isolation level—to determine the upper bound.


AM2: All offsets that lie before the Share Partition Start Offset (SPSO)
are considered processed by the share consumers, whereas all offsets beyond
the Share Partition End Offset (SPEO) are treated as candidates for future
processing, regardless of whether they correspond to control records,
compacted records, or regular records. However, within the range between
SPSO and SPEO, there may be cases where certain control or compacted
records have already been identified and excluded from the share partition
lag, while others are yet to be recognized and still contribute to it. This
nuanced handling of offsets is what differentiates share consumption from
regular consumption, which is why I wanted to highlight it.

AM3: I’ve updated the KIP to move this part under the Motivation section.

Regards,
Chirag Wadhwa

On Thu, 16 Oct 2025 at 21:58, Apoorv Mittal <[email protected]>
wrote:

> Hi Chirag,
> Thanks for the KIP, this is a very helpful feature to have for the share
> groups. Some comments on the KIP:
>
> AM1: Though having read_committed and read_uncommitted isolation levels
> while determining the highest offset makes complete sense, but I was
> wondering that lag might change if the group isolation level is switched,
> which might add confusion to customers. Also, consumer groups compute lag
> using read_uncommitted itself so maybe we just have parity with consumer
> groups and keep it simple for share groups as well, by considering
> read_uncommitted itself. wdyt?
>
> AM2: I am not sure what we mean by the following text "However, offsets
> within the in-flight boundary (between  SPSO  and SPEO ) require additional
> handling so that the lag more accurately reflects the number of records to
> be processed.", can you please help.
>
> AM3: Not sure if this text aligns in the Persistence section or should go
> in motivation: "Looking ahead, the plan is to implement an assignor that
> allocates members to partitions based on partition-level backlogs."
>
>
> Regards,
> Apoorv Mittal
>
>
> On Wed, Oct 15, 2025 at 12:23 PM Chirag Wadhwa
> <[email protected]>
> wrote:
>
> > Hi Andrew,
> > Thanks for the suggestions.
> >
> > Regarding the first point, the KIP has been updated to include a new
> > subsection that talks about control records, as well as the compacted
> > records.
> > Regarding the second point, I personally resonate more with
> > DeliveryCompleteCount. The schemas in the KIP have also been updated
> > accordingly.
> >
> > Thanks,
> > Chirag
> >
> > On Tue, Oct 14, 2025 at 6:45 PM Andrew Schofield <
> > [email protected]>
> > wrote:
> >
> > > Hi Chirag,
> > > Thanks for the KIP. I have a few comments.
> > >
> > > AS1: The calculation of the lag needs to take into account offsets
> which
> > > are
> > > not occupied by records, such as when they’ve been removed due to
> > > compaction.
> > > Also, the offsets which correspond to control records need to be taken
> > > into account.
> > > Please update the text to make this clear.
> > >
> > > AS2: The name InFlightTerminalRecords in the schemas seems a bit
> strange
> > > to me. What you are doing is calculating the offsets after the SPSO for
> > > which
> > > delivery is complete, either because the records are acknowledged or
> > > archived,
> > > or because they are control records, or because the offsets do not
> > > correspond to
> > > records at all. Personally, I only think of the in-flight records as
> > being
> > > those
> > > between the SPSO and the SPEO which have one of the delivery states,
> > > not those which never did.
> > >
> > > I’ve been very careful to exclude the SPEO from the external
> interfaces,
> > > because
> > > one day I expect to change the code so that the in-flight records are
> > > sparse
> > > and the distance between the SPSO and the SPEO can be much greater.
> > > The concept of lag in this KIP needs to be flexible enough to
> accommodate
> > > this.
> > >
> > > I wonder whether a name like DeliveryCompleteCount or
> > > DeliveryCompleteRecords
> > > instead of InFlightTerminalRecords would be better. This is the number
> of
> > > offsets
> > > after the SPSO for which the records have completed delivery, either
> > > because they’re
> > > in a terminal state, or because no delivery is required. Wdyt?
> > >
> > >
> > > Thanks,
> > > Andrew
> > >
> > > > On 9 Oct 2025, at 14:43, CHIRAG WADHWA <[email protected]
> >
> > > wrote:
> > > >
> > > > I'd like to start the discussion for KIP-1226: Introducing Share
> > > Partition
> > > > Lag Persistence and Retrieval.
> > > >
> > > > KIP Wiki:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1226:+Introducing+Share+Partition+Lag+Persistence+and+Retrieval
> > > >
> > > > Regards,
> > > > Chirag Wadhwa
> > >
> > >
> >
> > --
> >
> > [image: Confluent] <https://www.confluent.io>
> > Chirag Wadhwa
> > Software Engineer
> > +91 9873590730 <+91+9873590730>
> > Follow us: [image: Blog]
> > <
> >
> https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
> > >[image:
> > Twitter] <https://twitter.com/ConfluentInc>
> >
> > [image: Try Confluent Cloud for Free]
> > <
> >
> https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic
> > >
> >
>

Re: [DISCUSS] KIP-1226: Introducing Share Partition Lag Persistence and Retrieval

Reply via email to