Re: [DISCUSS] KIP-1222: Acquisition lock timeout renewal in share consumer explicit mode

Sushant Mahajan Sat, 18 Oct 2025 10:15:23 -0700

Hi Lianet,
Thanks for the feedback. I have incorporated LM1 and LM2.

For LM3 and LM4. The behavior is not different from what we have currently,
where the poll results are buffered on the client side. If an application
renews records continuously, the record the will get added to exisiting
fetch buffer for the share consumer, and not increase the memory footprint.


Regards,
Sushant Mahajan


On Wed, 15 Oct 2025, 15:58 Sushant Mahajan, <[email protected]> wrote:

> Hi Sanskar,
> Thanks for the feedback. I have added and additional section to proposed
> changes to highlight the issue and some debugging actions.
>
> Regards,
> Sushant Mahajan
>
> On Wed, 15 Oct 2025, 01:53 Lianet M., <[email protected]> wrote:
>
>> Hi Sushant, thanks for the KIP.
>>
>> LM1: About the validation for brokers that may not support this
>> feature, the KIP proposes to validate on the "acknowledge" call, but
>> that does not build any request or have any relationship with the
>> NetworkClient (where the ApiVersions info lives), so how exactly are
>> we imagining we would have the node version to check on acknowledge?
>> It's usually when we build the request that we have the info handy, to
>> check if the Node version is not what the client needs for a feature
>> it is using. So here it would mean that the error would be discovered
>> on the call to poll or commit following the acknowledge I expect. I do
>> agree that it makes more sense to throw it on ack, but just pointing
>> out the challenge to make sure we thought about it.
>>
>> LM2: About the IllegalArgumentException when using an ack type not
>> supported by the broker, I wonder if it should be
>> UnsupportedVersionException? I think this is what we consistently
>> throw when the client attempts to use a feature not supported by the
>> broker version. It's actually a common pattern on the client (related
>> to both comments LM1 and LM2): check node version when building the
>> request and throwing UnsupportedVersionException depending on a
>> feature being used:  ex.
>>
>> https://github.com/apache/kafka/blob/4cb5f1ca124b810b32fcef1ca23f00faf70603b6/clients/src/main/java/org/apache/kafka/common/requests/ListGroupsRequest.java#L52
>>
>> LM3: About the client-side buffering proposed to be able to return the
>> renewed records on poll, I wonder if we should be concerned about
>> potential memory issues? I expect that an app would easily end up
>> calling RENEW continuously (common if it's failing to process every
>> record for a period of time?). In that case, how would that buffer be
>> bound? The KIP mentions that it would keep records "until they are
>> re-delivered by the broker", so does it mean that the buffer would
>> continue growing until the client app is able to ACCEPT or REJECT
>> records?
>>
>> LM4: On the new client-side buffer too, did we consider the
>> alternative of no extra buffering on the client, but just have the
>> broker re-send the renewed records to the member that has them? I'm
>> probably missing some broker-side implications of the share fetch, but
>> curious to understand the trade-offs (looking at it from the client
>> side, my first reaction is to avoid adding another buffering on the
>> client if possible, due to the complexities and implications, like
>> memory issues).
>>
>> Thanks!
>> Lianet
>>
>> On Tue, Oct 14, 2025 at 6:14 AM Sushant Mahajan <[email protected]>
>> wrote:
>> >
>> > Hi Andrew,
>> > I have modified the kip to throw IllegalArgumentException on unsupported
>> > ack type.
>> >
>> > Regards,
>> > Sushant Mahajan
>> >
>> > On Mon, 13 Oct 2025, 18:35 Sanskar Jhajharia,
>> > <[email protected]> wrote:
>> >
>> > > Hey Sushant,
>> > > Thanks for the KIP. Overall, the KIP looks like a good enhancement to
>> the
>> > > Share Consumer interface and helps unlock a significant use case. A
>> couple
>> > > of questions or suggestions:
>> > >
>> > > SJ1: If a record is a "poison pill" (a record that consistently
>> causes a
>> > > consumer to require a very long processing timeout), a malicious or
>> buggy
>> > > consumer could be programmed to continuously renew the lock to
>> prevent it
>> > > from ever reaching a failure state on the consumer and being moved to
>> a
>> > > REJECTED or ARCHIVED state in the broker. Unlimited renewals can
>> create a
>> > > self-inflicted, persistent Head-of-Line (HOL) blocking issue for the
>> entire
>> > > share consumer group. Are there broker-side limits (e.g., a maximum
>> number
>> > > of renewals or a maximum cumulative lock duration) that can be
>> configured
>> > > to force a difficult record to eventually transition?
>> > >
>> > > SJ2: This is somehow linked to the comment `SJ1`. Would it make sense
>> to
>> > > improve visibility into lock renewal activity for better debugging and
>> > > operational awareness? For example, exposing broker-level metrics or
>> > > diagnostics that help operators identify cases of frequent or
>> ineffective
>> > > renewals could be valuable in understanding consumer or record-level
>> > > issues. These metrics may be essential for operators to debug
>> long-running
>> > > jobs, identify bottlenecks, and properly configure consumer timeouts.
>> > >
>> > > Regards,
>> > > Sanskar
>> > >
>> > > On Tue, Oct 7, 2025 at 3:46 PM Sushant Mahajan <[email protected]>
>> > > wrote:
>> > >
>> > > > I’d like to start the discussion for KIP-1222: Acquisition lock
>> timeout
>> > > > renewal in share consumer explicit mode.
>> > > >
>> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-19742
>> > > > KIP Wiki:
>> > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1222%3A+Acquisition+lock+timeout+renewal+in+share+consumer+explicit+mode
>> > > >
>> > > > Regards,
>> > > > Sushant Mahajan
>> > > >
>> > >
>>
>

Re: [DISCUSS] KIP-1222: Acquisition lock timeout renewal in share consumer explicit mode

Reply via email to