Re: [DISCUSS] KIP-1226: Introducing Share Partition Lag Persistence and Retrieval

Andrew Schofield Tue, 21 Oct 2025 06:14:37 -0700

Hi Chirag,
The KIP looks good from my point of view now.

Thanks,
Andrew


> On 17 Oct 2025, at 08:40, CHIRAG WADHWA <[email protected]> wrote:
>
> Hi Apoorv,
> Thanks for the suggestions. Please find my responses below:
>
> AM1: That was a small oversight on my part. Yes, for regular consumers, lag
> is always calculated using the read_uncommitted isolation level. I’ve
> updated the KIP to specify that the share partition lag calculation will
> also rely on the Log End Offset (LEO)—obtained using the read_uncommitted
> isolation level—to determine the upper bound.
>
> AM2: All offsets that lie before the Share Partition Start Offset (SPSO)
> are considered processed by the share consumers, whereas all offsets beyond
> the Share Partition End Offset (SPEO) are treated as candidates for future
> processing, regardless of whether they correspond to control records,
> compacted records, or regular records. However, within the range between
> SPSO and SPEO, there may be cases where certain control or compacted
> records have already been identified and excluded from the share partition
> lag, while others are yet to be recognized and still contribute to it. This
> nuanced handling of offsets is what differentiates share consumption from
> regular consumption, which is why I wanted to highlight it.
>
> AM3: I’ve updated the KIP to move this part under the Motivation section.
>
> Regards,
> Chirag Wadhwa
>
> On Thu, 16 Oct 2025 at 21:58, Apoorv Mittal <[email protected]>
> wrote:
>
>> Hi Chirag,
>> Thanks for the KIP, this is a very helpful feature to have for the share
>> groups. Some comments on the KIP:
>>
>> AM1: Though having read_committed and read_uncommitted isolation levels
>> while determining the highest offset makes complete sense, but I was
>> wondering that lag might change if the group isolation level is switched,
>> which might add confusion to customers. Also, consumer groups compute lag
>> using read_uncommitted itself so maybe we just have parity with consumer
>> groups and keep it simple for share groups as well, by considering
>> read_uncommitted itself. wdyt?
>>
>> AM2: I am not sure what we mean by the following text "However, offsets
>> within the in-flight boundary (between  SPSO  and SPEO ) require additional
>> handling so that the lag more accurately reflects the number of records to
>> be processed.", can you please help.
>>
>> AM3: Not sure if this text aligns in the Persistence section or should go
>> in motivation: "Looking ahead, the plan is to implement an assignor that
>> allocates members to partitions based on partition-level backlogs."
>>
>>
>> Regards,
>> Apoorv Mittal
>>
>>
>> On Wed, Oct 15, 2025 at 12:23 PM Chirag Wadhwa
>> <[email protected]>
>> wrote:
>>
>>> Hi Andrew,
>>> Thanks for the suggestions.
>>>
>>> Regarding the first point, the KIP has been updated to include a new
>>> subsection that talks about control records, as well as the compacted
>>> records.
>>> Regarding the second point, I personally resonate more with
>>> DeliveryCompleteCount. The schemas in the KIP have also been updated
>>> accordingly.
>>>
>>> Thanks,
>>> Chirag
>>>
>>> On Tue, Oct 14, 2025 at 6:45 PM Andrew Schofield <
>>> [email protected]>
>>> wrote:
>>>
>>>> Hi Chirag,
>>>> Thanks for the KIP. I have a few comments.
>>>>
>>>> AS1: The calculation of the lag needs to take into account offsets
>> which
>>>> are
>>>> not occupied by records, such as when they’ve been removed due to
>>>> compaction.
>>>> Also, the offsets which correspond to control records need to be taken
>>>> into account.
>>>> Please update the text to make this clear.
>>>>
>>>> AS2: The name InFlightTerminalRecords in the schemas seems a bit
>> strange
>>>> to me. What you are doing is calculating the offsets after the SPSO for
>>>> which
>>>> delivery is complete, either because the records are acknowledged or
>>>> archived,
>>>> or because they are control records, or because the offsets do not
>>>> correspond to
>>>> records at all. Personally, I only think of the in-flight records as
>>> being
>>>> those
>>>> between the SPSO and the SPEO which have one of the delivery states,
>>>> not those which never did.
>>>>
>>>> I’ve been very careful to exclude the SPEO from the external
>> interfaces,
>>>> because
>>>> one day I expect to change the code so that the in-flight records are
>>>> sparse
>>>> and the distance between the SPSO and the SPEO can be much greater.
>>>> The concept of lag in this KIP needs to be flexible enough to
>> accommodate
>>>> this.
>>>>
>>>> I wonder whether a name like DeliveryCompleteCount or
>>>> DeliveryCompleteRecords
>>>> instead of InFlightTerminalRecords would be better. This is the number
>> of
>>>> offsets
>>>> after the SPSO for which the records have completed delivery, either
>>>> because they’re
>>>> in a terminal state, or because no delivery is required. Wdyt?
>>>>
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>>> On 9 Oct 2025, at 14:43, CHIRAG WADHWA <[email protected]
>>>
>>>> wrote:
>>>>>
>>>>> I'd like to start the discussion for KIP-1226: Introducing Share
>>>> Partition
>>>>> Lag Persistence and Retrieval.
>>>>>
>>>>> KIP Wiki:
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1226:+Introducing+Share+Partition+Lag+Persistence+and+Retrieval
>>>>>
>>>>> Regards,
>>>>> Chirag Wadhwa
>>>>
>>>>
>>>
>>> --
>>>
>>> [image: Confluent] <https://www.confluent.io/>
>>> Chirag Wadhwa
>>> Software Engineer
>>> +91 9873590730 <+91+9873590730>
>>> Follow us: [image: Blog]
>>> <
>>>
>> https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
>>>> [image:
>>> Twitter] <https://twitter.com/ConfluentInc>
>>>
>>> [image: Try Confluent Cloud for Free]
>>> <
>>>
>> https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic
>>>>
>>>
>>

Re: [DISCUSS] KIP-1226: Introducing Share Partition Lag Persistence and Retrieval

Reply via email to