Hi Andrew,
Thanks for the response and sorry for the delay. The KIP looks good to me
and I have one follow up on AM3 regarding security for dlq topics.

Should there be a validation with the user principal for the dlq topic
WRITE access from the topic being read i.e. if application is reading from
T1 topic and dlq topic is dlq.T1 then should there be a check if
application has write permissions? The reason I am asking is that there
could be misconfigurations when another group uses
errors.deadletterqueue.topic.name to an already existing dlq topic, which
can flood the other applications dlq topic.

Regards,
Apoorv Mittal


On Tue, Jan 27, 2026 at 1:54 PM Andrew Schofield <[email protected]>
wrote:

> Hi PoAn,
> Thanks for your comment.
>
> PY00: I agree. I've made the changes to the KIP.
>
> Thanks,
> Andrew
>
> On 2026/01/15 10:16:18 PoAn Yang wrote:
> > Hi Andrew,
> >
> > Thanks for the KIP. I have a question about broker configuration.
> >
> > PY00: Would you consider mentioning the update mode for
> errors.deadletterqueue.topic.name.prefix
> > and errors.deadletterqueue.auto.create.topics.enable are cluster-wide?
> > Clarifying that these values must be consistent across the cluster (or
> updated dynamically as a cluster default)
> > would help preventing inconsistent values among brokers.
> >
> > Thanks,
> > PoAn
> >
> > > On Jan 8, 2026, at 6:18 PM, Andrew Schofield <[email protected]>
> wrote:
> > >
> > > Hi Shekhar,
> > > Thanks for your comment.
> > >
> > > If the leader of the DLQ topic-partition changes as we are trying to
> write to it,
> > > then the code will need to cope with this.
> > >
> > > If the leader of the share-partition changes, we do not need special
> processing.
> > > If the transition to ARCHIVED is affected by a share-partition
> leadership change,
> > > the new leader will be responsible for the state transition. For
> example, if a consumer
> > > has rejected a record, a leadership change will cause the rejection to
> fail, and the
> > > record will be delivered again. This new delivery attempt will be
> performed by the
> > > new leader, and if this delivery attempt results in a rejection, the
> new leader will
> > > be responsible for initiating the DLQ write.
> > >
> > > Hope this makes sense,
> > > Andrew
> > >
> > > On 2026/01/03 15:02:31 Shekhar Prasad Rajak via dev wrote:
> > >> Hi,
> > >> If leader changes during DLQ write, or  a share partition leader
> changes, the partition is marked FENCED and in-memory cache state is lost,
> I think we need to add those cases as well.
> > >> Ref
> https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/server/share/SharePartitionManager.java#L857
> > >>
> > >>
> > >>
> > >> Regards,Shekhar
> > >>
> > >>
> > >>
> > >>    On Monday 29 December 2025 at 11:53:20 pm GMT+5:30, Andrew
> Schofield <[email protected]> wrote:
> > >>
> > >> Hi Abhinav,
> > >> Thanks for your comments.
> > >>
> > >> AD01: Even if we were to allow the client to write to the DLQ topic,
> > >> it would not be sufficient for situations in which the problem is one
> > >> that the client cannot handle. So, my view is that it's preferable to
> > >> use the same mechanism for all DLQ topic writes, regardless of
> > >> whether the consumer initiated the process by rejecting a
> > >> record or not.
> > >>
> > >> AD02: I have added a metric for counting failed DLQ topic produce
> > >> requests per group. The KIP does say that the broker logs an
> > >> error when it fails to produce to the DLQ topic.
> > >>
> > >> Thanks,
> > >> Andrew
> > >>
> > >> On 2025/12/16 10:38:39 Abhinav Dixit via dev wrote:
> > >>> Hi Andrew,
> > >>> Thanks for this KIP. I have a couple of questions -
> > >>>
> > >>> AD01: From an implementation perspective, why can't we create/write
> records
> > >>> to the DLQ topic from the client? Why do we want to do it from the
> broker?
> > >>> As far as I understand, archiving the record on the share partition
> and
> > >>> writing records to DLQ are independent? As you've mentioned in the
> KIP, "It
> > >>> is possible in rare situations that more than one DLQ record could be
> > >>> written for a particular undeliverable record", won't we minimize
> these
> > >>> scenarios (by eliminating the dependency on persister write state
> result)
> > >>> by writing records to the DLQ from the client?
> > >>>
> > >>> AD02: I agree with AM01 that we should emit a metric which can
> report the
> > >>> count of failures of writing records to DLQ topic which an
> application
> > >>> developer can monitor. If we are logging an error, maybe we should
> log the
> > >>> count of such failures periodically?
> > >>>
> > >>> Regards,
> > >>> Abhinav Dixit
> > >>>
> > >>> On Fri, Dec 12, 2025 at 3:08 AM Apoorv Mittal <
> [email protected]>
> > >>> wrote:
> > >>>
> > >>>> Hi Andrew,
> > >>>> Thanks for the much needed enhancement for SHare Groups. Some
> questions:
> > >>>>
> > >>>> AM1: The KIP states that in case of some failure "the broker will
> log an
> > >>>> error", how an application developer will utilize this information
> and know
> > >>>> about any such occurrences? Should we emit a metric which can
> report the
> > >>>> count of such failures which an application developer can monitor?
> > >>>>
> > >>>> AM2: Today records can go to Archived state either when exceeded the
> > >>>> delivery limit or explicitly rejected by the client. I am expecting
> the
> > >>>> records will be written to dlq topic only in the former case i.e.
> when
> > >>>> exceeded the delivery limit, that's what KIP explains. If yes, then
> can't
> > >>>> there be a failure handling in the client which on serialization or
> other
> > >>>> issues want to reject the message explicitly to be placed on dlq?
> Should we
> > >>>> have a config which governs this behaviour i.e. if enabled then any
> > >>>> explicitly rejected record from client will also go to dlq?
> > >>>>
> > >>>> AM3: I read your response on the thread related to the tricky part
> of ACL
> > >>>> for DLQ topics and I have a question in the similar area. The KIP
> defines a
> > >>>> config "errors.deadletterqueue.auto.create.topics.enable" which if
> enabled
> > >>>> then broker can create the topic automatically using given other
> dlq topic
> > >>>> params. If a new dlq topic is created then what basic permissions
> should be
> > >>>> applied so the application developer can access? Should we provide
> > >>>> capability to create dlq topics automatically or should restrict
> that and
> > >>>> enforce it to be created by the application owner? By latter we
> know the
> > >>>> application owner has access to the dlq topic already.
> > >>>>
> > >>>> AM4: For the "errors.deadletterqueue.topic.name.prefix", I am
> expecting
> > >>>> that this applies well for auto created dlq topics. But how do we
> enforce
> > >>>> the prefix behaviour when the application developer provides the
> dlq topic
> > >>>> name in group configuration? Will there be a check while setting
> the group
> > >>>> configuration "errors.deadletterqueue.topic.name" as per broker
> expected
> > >>>> prefix?
> > >>>>
> > >>>> Regards,
> > >>>> Apoorv Mittal
> > >>>>
> > >>>>
> > >>>> On Wed, Dec 10, 2025 at 5:59 PM Federico Valeri <
> [email protected]>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Andrew, a few comments/questions from me:
> > >>>>>
> > >>>>> FV00: The KIP says "copying of the original record data into the
> DLQ
> > >>>>> is controlled by two configurations", but I only see the client
> side
> > >>>>> configuration in the latest revision.
> > >>>>>
> > >>>>> FV01: The KIP says: "When an undeliverable record transitions to
> the
> > >>>>> Archived state for such a group, a record is written onto the DLQ
> > >>>>> topic". Later on it mentions a new "Archiving" state. Can you
> clarify
> > >>>>> the state transition when sending a record to a DLQ?
> > >>>>>
> > >>>>> FV02: Is the new state required to ensure that the DLQ record is
> > >>>>> eventually written in case of the Share Coordinator failover?
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Fede
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Dec 2, 2025 at 7:19 PM Andrew Schofield <
> [email protected]>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>> I'd like to bump this discussion thread for adding DLQs to share
> > >>>> groups.
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Andrew
> > >>>>>>
> > >>>>>> On 2025/10/16 19:02:48 Andrew Schofield wrote:
> > >>>>>>> Hi Chia-Ping,
> > >>>>>>> Apologies for not responding to your comments. I was having email
> > >>>>> problems
> > >>>>>>> and I’ve only just noticed the unanswered comments. Also, this is
> > >>>> not a
> > >>>>>>> direct reply.
> > >>>>>>>
> > >>>>>>>>> chia00: How can we specify the number of partitions and the
> > >>>>> replication factor
> > >>>>>>>   when `errors.deadletterqueue.auto.create.topics.enable` is set
> to
> > >>>>> true?
> > >>>>>>>
> > >>>>>>> Personally, I prefer to make people create their DLQ topics
> manually,
> > >>>>> but I take the
> > >>>>>>> point. In order to give full flexibility, the list of configs you
> > >>>> need
> > >>>>> is quite long including
> > >>>>>>> min.isr and compression. For consistency with Kafka Connect sink
> > >>>>> connectors, I
> > >>>>>>> could add `errors.deadletterqueue.topic.replication.factor` but
> > >>>> that's
> > >>>>> the only
> > >>>>>>> additional config provided by Kafka Connect. Is that worthwhile?
> I
> > >>>>> suggest not.
> > >>>>>>>
> > >>>>>>> The DLQ topic config in this KIP is broker-level config, while
> it's
> > >>>>> connector-level
> > >>>>>>> config for Kafka Connect. So, my preference is to just have one
> > >>>>> broker-level config
> > >>>>>>> for auto-creation on/off, and auto-create with the cluster's
> topic
> > >>>>> defaults. If anything
> > >>>>>>> more specific is required, the administrator can create the DLQ
> topic
> > >>>>> themselves with
> > >>>>>>> their preferences. Let me know what you think.
> > >>>>>>>
> > >>>>>>>>> chia01: Should the error stack trace be included in the message
> > >>>>> headers,
> > >>>>>>>   similar to what's done in KIP-298?
> > >>>>>>>
> > >>>>>>> In KIP-298, the code deciding to write a message to the DLQ is
> > >>>> running
> > >>>>> in the
> > >>>>>>> Kafka Connect task and an exception is readily available. In this
> > >>>> KIP,
> > >>>>> the code writing
> > >>>>>>> to the DLQ is running in the broker and it doesn't have any
> detail
> > >>>>> about why the
> > >>>>>>> record is being DLQed. I think that actually the
> > >>>>> __dlq.errors.exception.*  headers
> > >>>>>>> are not feasible without allowing the application to provide
> > >>>>> additional error context.
> > >>>>>>> That might be helpful one day, but that's extending this KIP more
> > >>>> than
> > >>>>> I intend.
> > >>>>>>> I have removed these headers from the KIP.
> > >>>>>>>
> > >>>>>>>>> chia02: Why does `errors.deadletterqueue.copy.record.enable`
> have
> > >>>>> different
> > >>>>>>> default values at the broker level and group level?
> > >>>>>>>
> > >>>>>>> I want the group administrator to be able to choose whether to
> copy
> > >>>>> the payloads.
> > >>>>>>> I was also thinking that it would be a good idea if the cluster
> > >>>>> administrator could
> > >>>>>>> prevent this across the cluster, but I've changed my mind and
> I've
> > >>>>> removed it.
> > >>>>>>>
> > >>>>>>> Maybe a better idea would simply to have a broker config
> > >>>>>>> `group.share.errors.deadletterqueue.enable` to turn DLQ on/off.
> The
> > >>>>> other
> > >>>>>>> broker configs in this KIP do not start `group.share.` because
> > >>>> they're
> > >>>>> intended
> > >>>>>>> for other DLQ uses by the broker in future.
> > >>>>>>>
> > >>>>>>> Note that although share.version=2 is required to enable DLQ,
> this
> > >>>>> isn't a suitable
> > >>>>>>> long-term switch because we might have share.version > 2 due to
> > >>>>> another future
> > >>>>>>> enhancement.
> > >>>>>>>
> > >>>>>>>>> chia03: Does the broker log an error for every message if the
> DLQ
> > >>>>> topic fails to be created?
> > >>>>>>>
> > >>>>>>> No, that seems excessive and likely to flood the logs. I would
> > >>>>> implement something like
> > >>>>>>> no more than one log per minute, per share-partition. That would
> be
> > >>>>> annoying enough to
> > >>>>>>> fix without being catastrophically verbose.
> > >>>>>>>
> > >>>>>>> Of course, if the group config `
> errors.deadletterqueue.topic.name`
> > >>>>> has a value which
> > >>>>>>> does not satisfy the broker config
> > >>>>> `errors.deadletterqueue.topic.name.prefix`, it will
> > >>>>>>> be considered a config error and the DLQ will not be used.
> > >>>>>>>
> > >>>>>>>>> chia04: Have you consider adding metrics for the DLQ?
> > >>>>>>>
> > >>>>>>> Yes, that is a good idea. I've added some metrics to the KIP.
> Please
> > >>>>> take a look.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Andrew
> > >>>>>>>
> > >>>>>>>> On 4 Aug 2025, at 11:30, Andrew Schofield <
> > >>>>> [email protected]> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>> Thanks for your comments on the KIP and sorry for the delay in
> > >>>>> responding.
> > >>>>>>>>
> > >>>>>>>> D01: Authorisation is the area of this KIP that I think is most
> > >>>>> tricky. The reason that I didn't implement specific
> > >>>>>>>> ACLs for DLQs because I was not convinced they would help. So,
> if
> > >>>>> you have a specific idea in mind, please
> > >>>>>>>> let me know. This is the area that I'm least comfortable with in
> > >>>> the
> > >>>>> KIP.
> > >>>>>>>>
> > >>>>>>>> I suppose maybe to set the DLQ name for a group, you could need
> a
> > >>>>> higher level of authorisation
> > >>>>>>>> than just ALTER_CONFIGS on the GROUP. But what I settled with in
> > >>>> the
> > >>>>> KIP was that DLQ topics
> > >>>>>>>> all start with the same prefix, defaulting to "dlq.", and that
> the
> > >>>>> topics do not automatically create.
> > >>>>>>>>
> > >>>>>>>> D02: I can see that. I've added a config which I've called
> > >>>>> errors.deadletterqueue.auto.create.topics.enable
> > >>>>>>>> just to have a consistent prefix on all of the config names.
> Let me
> > >>>>> know what you think.
> > >>>>>>>>
> > >>>>>>>> D03: I've added some text about failure scenarios when
> attempting
> > >>>> to
> > >>>>> write records to the DLQ.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Andrew
> > >>>>>>>> ________________________________________
> > >>>>>>>> From: isding_l <[email protected]>
> > >>>>>>>> Sent: 16 July 2025 04:18
> > >>>>>>>> To: dev <[email protected]>
> > >>>>>>>> Subject: Re: [DISCUSS]: KIP-1191: Dead-letter queues for share
> > >>>> groups
> > >>>>>>>>
> > >>>>>>>> Hi Andrew,
> > >>>>>>>> Thanks for the nice KIP, This KIP design for introducing
> > >>>> dead-letter
> > >>>>> queues (DLQs) for Share Groups is generally clear and reasonable,
> > >>>>> addressing the key pain points of handling "poison message".
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> D01: Should we consider implementing independent ACL
> configurations
> > >>>>> for DLQs? This would enable separate management of DLQ topic
> read/write
> > >>>>> permissions from source topics, preventing privilege escalation
> attacks
> > >>>> via
> > >>>>> "poison message" + DLQ mechanisms.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> D02: While disabling automatic DLQ topic creation is justifiable
> > >>>> for
> > >>>>> security, it creates operational overhead in automated
> deployments. Can
> > >>>> we
> > >>>>> introduce a configuration parameter auto.create.dlq.topics.enable
> to
> > >>>> govern
> > >>>>> this behavior?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> D03: How should we handle failure scenarios when brokers
> attempt to
> > >>>>> write records to the DLQ?
> > >>>>>>>> ---- Replied Message ----
> > >>>>>>>> | From | Andrew Schofield<[email protected]> |
> > >>>>>>>> | Date | 07/08/2025 17:54 |
> > >>>>>>>> | To | [email protected]<[email protected]> |
> > >>>>>>>> | Subject | [DISCUSS]: KIP-1191: Dead-letter queues for share
> > >>>> groups
> > >>>>> |
> > >>>>>>>> Hi,
> > >>>>>>>> I'd like to start discussion on KIP-1191 which adds dead-letter
> > >>>>> queue support for share groups.
> > >>>>>>>> Records which cannot be processed by consumers in a share group
> can
> > >>>>> be automatically copied
> > >>>>>>>> onto another topic for a closer look.
> > >>>>>>>>
> > >>>>>>>> KIP:
> > >>>>>
> > >>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Andrew
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

Reply via email to