Hi Andrew,

Thanks for the KIP. I have a question about broker configuration.

PY00: Would you consider mentioning the update mode for 
errors.deadletterqueue.topic.name.prefix
and errors.deadletterqueue.auto.create.topics.enable are cluster-wide?
Clarifying that these values must be consistent across the cluster (or updated 
dynamically as a cluster default)
would help preventing inconsistent values among brokers.

Thanks,
PoAn

> On Jan 8, 2026, at 6:18 PM, Andrew Schofield <[email protected]> wrote:
> 
> Hi Shekhar,
> Thanks for your comment.
> 
> If the leader of the DLQ topic-partition changes as we are trying to write to 
> it,
> then the code will need to cope with this.
> 
> If the leader of the share-partition changes, we do not need special 
> processing.
> If the transition to ARCHIVED is affected by a share-partition leadership 
> change,
> the new leader will be responsible for the state transition. For example, if 
> a consumer
> has rejected a record, a leadership change will cause the rejection to fail, 
> and the
> record will be delivered again. This new delivery attempt will be performed 
> by the
> new leader, and if this delivery attempt results in a rejection, the new 
> leader will
> be responsible for initiating the DLQ write.
> 
> Hope this makes sense,
> Andrew
> 
> On 2026/01/03 15:02:31 Shekhar Prasad Rajak via dev wrote:
>> Hi,
>> If leader changes during DLQ write, or  a share partition leader changes, 
>> the partition is marked FENCED and in-memory cache state is lost, I think we 
>> need to add those cases as well. 
>> Ref 
>> https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/server/share/SharePartitionManager.java#L857
>> 
>> 
>> 
>> Regards,Shekhar
>> 
>> 
>> 
>>    On Monday 29 December 2025 at 11:53:20 pm GMT+5:30, Andrew Schofield 
>> <[email protected]> wrote:  
>> 
>> Hi Abhinav,
>> Thanks for your comments.
>> 
>> AD01: Even if we were to allow the client to write to the DLQ topic,
>> it would not be sufficient for situations in which the problem is one
>> that the client cannot handle. So, my view is that it's preferable to
>> use the same mechanism for all DLQ topic writes, regardless of
>> whether the consumer initiated the process by rejecting a
>> record or not.
>> 
>> AD02: I have added a metric for counting failed DLQ topic produce
>> requests per group. The KIP does say that the broker logs an
>> error when it fails to produce to the DLQ topic.
>> 
>> Thanks,
>> Andrew
>> 
>> On 2025/12/16 10:38:39 Abhinav Dixit via dev wrote:
>>> Hi Andrew,
>>> Thanks for this KIP. I have a couple of questions -
>>> 
>>> AD01: From an implementation perspective, why can't we create/write records
>>> to the DLQ topic from the client? Why do we want to do it from the broker?
>>> As far as I understand, archiving the record on the share partition and
>>> writing records to DLQ are independent? As you've mentioned in the KIP, "It
>>> is possible in rare situations that more than one DLQ record could be
>>> written for a particular undeliverable record", won't we minimize these
>>> scenarios (by eliminating the dependency on persister write state result)
>>> by writing records to the DLQ from the client?
>>> 
>>> AD02: I agree with AM01 that we should emit a metric which can report the
>>> count of failures of writing records to DLQ topic which an application
>>> developer can monitor. If we are logging an error, maybe we should log the
>>> count of such failures periodically?
>>> 
>>> Regards,
>>> Abhinav Dixit
>>> 
>>> On Fri, Dec 12, 2025 at 3:08 AM Apoorv Mittal <[email protected]>
>>> wrote:
>>> 
>>>> Hi Andrew,
>>>> Thanks for the much needed enhancement for SHare Groups. Some questions:
>>>> 
>>>> AM1: The KIP states that in case of some failure "the broker will log an
>>>> error", how an application developer will utilize this information and know
>>>> about any such occurrences? Should we emit a metric which can report the
>>>> count of such failures which an application developer can monitor?
>>>> 
>>>> AM2: Today records can go to Archived state either when exceeded the
>>>> delivery limit or explicitly rejected by the client. I am expecting the
>>>> records will be written to dlq topic only in the former case i.e. when
>>>> exceeded the delivery limit, that's what KIP explains. If yes, then can't
>>>> there be a failure handling in the client which on serialization or other
>>>> issues want to reject the message explicitly to be placed on dlq? Should we
>>>> have a config which governs this behaviour i.e. if enabled then any
>>>> explicitly rejected record from client will also go to dlq?
>>>> 
>>>> AM3: I read your response on the thread related to the tricky part of ACL
>>>> for DLQ topics and I have a question in the similar area. The KIP defines a
>>>> config "errors.deadletterqueue.auto.create.topics.enable" which if enabled
>>>> then broker can create the topic automatically using given other dlq topic
>>>> params. If a new dlq topic is created then what basic permissions should be
>>>> applied so the application developer can access? Should we provide
>>>> capability to create dlq topics automatically or should restrict that and
>>>> enforce it to be created by the application owner? By latter we know the
>>>> application owner has access to the dlq topic already.
>>>> 
>>>> AM4: For the "errors.deadletterqueue.topic.name.prefix", I am expecting
>>>> that this applies well for auto created dlq topics. But how do we enforce
>>>> the prefix behaviour when the application developer provides the dlq topic
>>>> name in group configuration? Will there be a check while setting the group
>>>> configuration "errors.deadletterqueue.topic.name" as per broker expected
>>>> prefix?
>>>> 
>>>> Regards,
>>>> Apoorv Mittal
>>>> 
>>>> 
>>>> On Wed, Dec 10, 2025 at 5:59 PM Federico Valeri <[email protected]>
>>>> wrote:
>>>> 
>>>>> Hi Andrew, a few comments/questions from me:
>>>>> 
>>>>> FV00: The KIP says "copying of the original record data into the DLQ
>>>>> is controlled by two configurations", but I only see the client side
>>>>> configuration in the latest revision.
>>>>> 
>>>>> FV01: The KIP says: "When an undeliverable record transitions to the
>>>>> Archived state for such a group, a record is written onto the DLQ
>>>>> topic". Later on it mentions a new "Archiving" state. Can you clarify
>>>>> the state transition when sending a record to a DLQ?
>>>>> 
>>>>> FV02: Is the new state required to ensure that the DLQ record is
>>>>> eventually written in case of the Share Coordinator failover?
>>>>> 
>>>>> Thanks,
>>>>> Fede
>>>>> 
>>>>> 
>>>>> On Tue, Dec 2, 2025 at 7:19 PM Andrew Schofield <[email protected]>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> I'd like to bump this discussion thread for adding DLQs to share
>>>> groups.
>>>>>> 
>>>>>> 
>>>>> 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups
>>>>>> 
>>>>>> Thanks,
>>>>>> Andrew
>>>>>> 
>>>>>> On 2025/10/16 19:02:48 Andrew Schofield wrote:
>>>>>>> Hi Chia-Ping,
>>>>>>> Apologies for not responding to your comments. I was having email
>>>>> problems
>>>>>>> and I’ve only just noticed the unanswered comments. Also, this is
>>>> not a
>>>>>>> direct reply.
>>>>>>> 
>>>>>>>>> chia00: How can we specify the number of partitions and the
>>>>> replication factor
>>>>>>>   when `errors.deadletterqueue.auto.create.topics.enable` is set to
>>>>> true?
>>>>>>> 
>>>>>>> Personally, I prefer to make people create their DLQ topics manually,
>>>>> but I take the
>>>>>>> point. In order to give full flexibility, the list of configs you
>>>> need
>>>>> is quite long including
>>>>>>> min.isr and compression. For consistency with Kafka Connect sink
>>>>> connectors, I
>>>>>>> could add `errors.deadletterqueue.topic.replication.factor` but
>>>> that's
>>>>> the only
>>>>>>> additional config provided by Kafka Connect. Is that worthwhile? I
>>>>> suggest not.
>>>>>>> 
>>>>>>> The DLQ topic config in this KIP is broker-level config, while it's
>>>>> connector-level
>>>>>>> config for Kafka Connect. So, my preference is to just have one
>>>>> broker-level config
>>>>>>> for auto-creation on/off, and auto-create with the cluster's topic
>>>>> defaults. If anything
>>>>>>> more specific is required, the administrator can create the DLQ topic
>>>>> themselves with
>>>>>>> their preferences. Let me know what you think.
>>>>>>> 
>>>>>>>>> chia01: Should the error stack trace be included in the message
>>>>> headers,
>>>>>>>   similar to what's done in KIP-298?
>>>>>>> 
>>>>>>> In KIP-298, the code deciding to write a message to the DLQ is
>>>> running
>>>>> in the
>>>>>>> Kafka Connect task and an exception is readily available. In this
>>>> KIP,
>>>>> the code writing
>>>>>>> to the DLQ is running in the broker and it doesn't have any detail
>>>>> about why the
>>>>>>> record is being DLQed. I think that actually the
>>>>> __dlq.errors.exception.*  headers
>>>>>>> are not feasible without allowing the application to provide
>>>>> additional error context.
>>>>>>> That might be helpful one day, but that's extending this KIP more
>>>> than
>>>>> I intend.
>>>>>>> I have removed these headers from the KIP.
>>>>>>> 
>>>>>>>>> chia02: Why does `errors.deadletterqueue.copy.record.enable` have
>>>>> different
>>>>>>> default values at the broker level and group level?
>>>>>>> 
>>>>>>> I want the group administrator to be able to choose whether to copy
>>>>> the payloads.
>>>>>>> I was also thinking that it would be a good idea if the cluster
>>>>> administrator could
>>>>>>> prevent this across the cluster, but I've changed my mind and I've
>>>>> removed it.
>>>>>>> 
>>>>>>> Maybe a better idea would simply to have a broker config
>>>>>>> `group.share.errors.deadletterqueue.enable` to turn DLQ on/off. The
>>>>> other
>>>>>>> broker configs in this KIP do not start `group.share.` because
>>>> they're
>>>>> intended
>>>>>>> for other DLQ uses by the broker in future.
>>>>>>> 
>>>>>>> Note that although share.version=2 is required to enable DLQ, this
>>>>> isn't a suitable
>>>>>>> long-term switch because we might have share.version > 2 due to
>>>>> another future
>>>>>>> enhancement.
>>>>>>> 
>>>>>>>>> chia03: Does the broker log an error for every message if the DLQ
>>>>> topic fails to be created?
>>>>>>> 
>>>>>>> No, that seems excessive and likely to flood the logs. I would
>>>>> implement something like
>>>>>>> no more than one log per minute, per share-partition. That would be
>>>>> annoying enough to
>>>>>>> fix without being catastrophically verbose.
>>>>>>> 
>>>>>>> Of course, if the group config `errors.deadletterqueue.topic.name`
>>>>> has a value which
>>>>>>> does not satisfy the broker config
>>>>> `errors.deadletterqueue.topic.name.prefix`, it will
>>>>>>> be considered a config error and the DLQ will not be used.
>>>>>>> 
>>>>>>>>> chia04: Have you consider adding metrics for the DLQ?
>>>>>>> 
>>>>>>> Yes, that is a good idea. I've added some metrics to the KIP. Please
>>>>> take a look.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Andrew
>>>>>>> 
>>>>>>>> On 4 Aug 2025, at 11:30, Andrew Schofield <
>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> Thanks for your comments on the KIP and sorry for the delay in
>>>>> responding.
>>>>>>>> 
>>>>>>>> D01: Authorisation is the area of this KIP that I think is most
>>>>> tricky. The reason that I didn't implement specific
>>>>>>>> ACLs for DLQs because I was not convinced they would help. So, if
>>>>> you have a specific idea in mind, please
>>>>>>>> let me know. This is the area that I'm least comfortable with in
>>>> the
>>>>> KIP.
>>>>>>>> 
>>>>>>>> I suppose maybe to set the DLQ name for a group, you could need a
>>>>> higher level of authorisation
>>>>>>>> than just ALTER_CONFIGS on the GROUP. But what I settled with in
>>>> the
>>>>> KIP was that DLQ topics
>>>>>>>> all start with the same prefix, defaulting to "dlq.", and that the
>>>>> topics do not automatically create.
>>>>>>>> 
>>>>>>>> D02: I can see that. I've added a config which I've called
>>>>> errors.deadletterqueue.auto.create.topics.enable
>>>>>>>> just to have a consistent prefix on all of the config names. Let me
>>>>> know what you think.
>>>>>>>> 
>>>>>>>> D03: I've added some text about failure scenarios when attempting
>>>> to
>>>>> write records to the DLQ.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Andrew
>>>>>>>> ________________________________________
>>>>>>>> From: isding_l <[email protected]>
>>>>>>>> Sent: 16 July 2025 04:18
>>>>>>>> To: dev <[email protected]>
>>>>>>>> Subject: Re: [DISCUSS]: KIP-1191: Dead-letter queues for share
>>>> groups
>>>>>>>> 
>>>>>>>> Hi Andrew,
>>>>>>>> Thanks for the nice KIP, This KIP design for introducing
>>>> dead-letter
>>>>> queues (DLQs) for Share Groups is generally clear and reasonable,
>>>>> addressing the key pain points of handling "poison message".
>>>>>>>> 
>>>>>>>> 
>>>>>>>> D01: Should we consider implementing independent ACL configurations
>>>>> for DLQs? This would enable separate management of DLQ topic read/write
>>>>> permissions from source topics, preventing privilege escalation attacks
>>>> via
>>>>> "poison message" + DLQ mechanisms.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> D02: While disabling automatic DLQ topic creation is justifiable
>>>> for
>>>>> security, it creates operational overhead in automated deployments. Can
>>>> we
>>>>> introduce a configuration parameter auto.create.dlq.topics.enable to
>>>> govern
>>>>> this behavior?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> D03: How should we handle failure scenarios when brokers attempt to
>>>>> write records to the DLQ?
>>>>>>>> ---- Replied Message ----
>>>>>>>> | From | Andrew Schofield<[email protected]> |
>>>>>>>> | Date | 07/08/2025 17:54 |
>>>>>>>> | To | [email protected]<[email protected]> |
>>>>>>>> | Subject | [DISCUSS]: KIP-1191: Dead-letter queues for share
>>>> groups
>>>>> |
>>>>>>>> Hi,
>>>>>>>> I'd like to start discussion on KIP-1191 which adds dead-letter
>>>>> queue support for share groups.
>>>>>>>> Records which cannot be processed by consumers in a share group can
>>>>> be automatically copied
>>>>>>>> onto another topic for a closer look.
>>>>>>>> 
>>>>>>>> KIP:
>>>>> 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Andrew
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to