Hi devs,

As stated in the motivation section in KIP-854
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry>:

With idempotent producers becoming the default in Kafka, this means that
unless otherwise specified, all new producers will be given producer IDs.
Some (inefficient) applications may now create many non-transactional
idempotent producers. Each of these producers will be assigned a producer
ID and these IDs and their metadata are stored in the broker memory, which
might cause brokers out of memory.

Justine (in cc.) and I and some other team members are working on the
solutions for this issue. But none of them solves it completely without
side effects. Among them, "availability" VS "idempotency guarantees" is
what we can't decide which to sacrifice. Some of these solutions sacrifice
availability of produce (1,2,5) and others sacrifice idempotency guarantees
(3). It could be useful to know if people generally have a preference one
way or the other. Or what other better solutions there might be.

Here are the proposals we came up with:

1. Limit the total active producer ID allocation number.
-> This is the simplest solution. But since the OOM issue is usually caused
by a rogue or misconfigured client, and this solution might "punish" the
good client from sending messages.

2. Throttling the producer ID allocation rate
-> Same concern as the solution #1.

3. Having a limit to the number of active producer IDs (sort of like an LRU
cache)
-> The idea here is that if we hit a misconfigured client, we will expire
the older entries. The concern here is we have risks to lose idempotency
guarantees, and currently, we don't have a way to notify clients about
losing idempotency guarantees. Besides, the least  recently used entries
got removed are not always from the "bad" clients.

4. allow clients to "close" the producer ID usage
-> We can provide a way for producer to "close" producerID usage.
Currently, we only have a way to INIT_PRODUCER_ID requested to allocate
one. After that, we'll keep the producer ID metadata in broker even if the
producer is "closed". Having a closed API (ex: END_PRODUCER_ID), we can
remove the entry from broker side. In client side, we can send it when
producer closing. The concern is, the old clients (including non-java
clients) will still suffer from the OOM issue.

5. limit/throttling the producer id based on the principle
-> Although we can limit the impact to a certain principle with this idea,
same concern still exists as solution #1 #2.

Any thoughts/feedback are welcomed.

Thank you.
Luke

Reply via email to