Hi Luke & Justine,
Thanks for looking into this issue, we have been experiencing this because
of rouge clients as well.

> 3. Having a limit to the number of active producer IDs (sort of like an
LRU
>cache)
>-> The idea here is that if we hit a misconfigured client, we will expire
>the older entries. The concern here is we have risks to lose idempotency
>guarantees, and currently, we don't have a way to notify clients about
>losing idempotency guarantees. Besides, the least  recently used entries
>got removed are not always from the "bad" clients.

- I have some concerns about the impact of this option on the transactional
producers, for example, what will happen to an ongoing transaction
associated with an expired PID? Would this leave the transactions in a
"hanging" state?

- How will we notify the client that the transaction can't continue due to
an expired PID?

- If PID got marked as `expired` this will mean that
`admin.DescribeProducers` will not list them which will make
*`kafka-transactions.sh
--list`* a bit tricky as we can't identify if there are transactions linked
to this expired PID or not. The same concern applies to *`kafka-transactions.sh
--find-hanging`*.


>5. limit/throttling the producer id based on the principle
>-> Although we can limit the impact to a certain principle with this idea,
>same concern still exists as solution #1 #2.

I am assuming you mean KafkaPrincipal here! If so is your concern here that
those good clients that use the same principal as a rogue one will get
throttled?

If this is the case, then I believe it should be okay as other throttling
in Kafka on *`/config/users/<user>`* has the same behaviour.


What about applying limit/throttling to
*`/config/users/<user>/clients/<client-id>`
*similar to what we have with client quota? This should reduce the concern
about throttling good clients, right?

best,

Omnia

On Tue, Oct 11, 2022 at 4:18 AM Luke Chen <show...@gmail.com> wrote:

> Bump this thread to see if there are any comments/thoughts.
> Thanks.
>
> Luke
>
> On Mon, Sep 26, 2022 at 11:06 AM Luke Chen <show...@gmail.com> wrote:
>
> > Hi devs,
> >
> > As stated in the motivation section in KIP-854
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> >:
> >
> > With idempotent producers becoming the default in Kafka, this means that
> > unless otherwise specified, all new producers will be given producer IDs.
> > Some (inefficient) applications may now create many non-transactional
> > idempotent producers. Each of these producers will be assigned a producer
> > ID and these IDs and their metadata are stored in the broker memory,
> which
> > might cause brokers out of memory.
> >
> > Justine (in cc.) and I and some other team members are working on the
> > solutions for this issue. But none of them solves it completely without
> > side effects. Among them, "availability" VS "idempotency guarantees" is
> > what we can't decide which to sacrifice. Some of these solutions
> sacrifice
> > availability of produce (1,2,5) and others sacrifice idempotency
> guarantees
> > (3). It could be useful to know if people generally have a preference one
> > way or the other. Or what other better solutions there might be.
> >
> > Here are the proposals we came up with:
> >
> > 1. Limit the total active producer ID allocation number.
> > -> This is the simplest solution. But since the OOM issue is usually
> > caused by a rogue or misconfigured client, and this solution might
> "punish"
> > the good client from sending messages.
> >
> > 2. Throttling the producer ID allocation rate
> > -> Same concern as the solution #1.
> >
> > 3. Having a limit to the number of active producer IDs (sort of like an
> > LRU cache)
> > -> The idea here is that if we hit a misconfigured client, we will expire
> > the older entries. The concern here is we have risks to lose idempotency
> > guarantees, and currently, we don't have a way to notify clients about
> > losing idempotency guarantees. Besides, the least  recently used entries
> > got removed are not always from the "bad" clients.
> >
> > 4. allow clients to "close" the producer ID usage
> > -> We can provide a way for producer to "close" producerID usage.
> > Currently, we only have a way to INIT_PRODUCER_ID requested to allocate
> > one. After that, we'll keep the producer ID metadata in broker even if
> the
> > producer is "closed". Having a closed API (ex: END_PRODUCER_ID), we can
> > remove the entry from broker side. In client side, we can send it when
> > producer closing. The concern is, the old clients (including non-java
> > clients) will still suffer from the OOM issue.
> >
> > 5. limit/throttling the producer id based on the principle
> > -> Although we can limit the impact to a certain principle with this
> idea,
> > same concern still exists as solution #1 #2.
> >
> > Any thoughts/feedback are welcomed.
> >
> > Thank you.
> > Luke
> >
>

Reply via email to