Hi Andrew,
Thanks for the much needed enhancement for SHare Groups. Some questions:

AM1: The KIP states that in case of some failure "the broker will log an
error", how an application developer will utilize this information and know
about any such occurrences? Should we emit a metric which can report the
count of such failures which an application developer can monitor?

AM2: Today records can go to Archived state either when exceeded the
delivery limit or explicitly rejected by the client. I am expecting the
records will be written to dlq topic only in the former case i.e. when
exceeded the delivery limit, that's what KIP explains. If yes, then can't
there be a failure handling in the client which on serialization or other
issues want to reject the message explicitly to be placed on dlq? Should we
have a config which governs this behaviour i.e. if enabled then any
explicitly rejected record from client will also go to dlq?

AM3: I read your response on the thread related to the tricky part of ACL
for DLQ topics and I have a question in the similar area. The KIP defines a
config "errors.deadletterqueue.auto.create.topics.enable" which if enabled
then broker can create the topic automatically using given other dlq topic
params. If a new dlq topic is created then what basic permissions should be
applied so the application developer can access? Should we provide
capability to create dlq topics automatically or should restrict that and
enforce it to be created by the application owner? By latter we know the
application owner has access to the dlq topic already.

AM4: For the "errors.deadletterqueue.topic.name.prefix", I am expecting
that this applies well for auto created dlq topics. But how do we enforce
the prefix behaviour when the application developer provides the dlq topic
name in group configuration? Will there be a check while setting the group
configuration "errors.deadletterqueue.topic.name" as per broker expected
prefix?

Regards,
Apoorv Mittal


On Wed, Dec 10, 2025 at 5:59 PM Federico Valeri <[email protected]>
wrote:

> Hi Andrew, a few comments/questions from me:
>
> FV00: The KIP says "copying of the original record data into the DLQ
> is controlled by two configurations", but I only see the client side
> configuration in the latest revision.
>
> FV01: The KIP says: "When an undeliverable record transitions to the
> Archived state for such a group, a record is written onto the DLQ
> topic". Later on it mentions a new "Archiving" state. Can you clarify
> the state transition when sending a record to a DLQ?
>
> FV02: Is the new state required to ensure that the DLQ record is
> eventually written in case of the Share Coordinator failover?
>
> Thanks,
> Fede
>
>
> On Tue, Dec 2, 2025 at 7:19 PM Andrew Schofield <[email protected]>
> wrote:
> >
> > Hi,
> > I'd like to bump this discussion thread for adding DLQs to share groups.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups
> >
> > Thanks,
> > Andrew
> >
> > On 2025/10/16 19:02:48 Andrew Schofield wrote:
> > > Hi Chia-Ping,
> > > Apologies for not responding to your comments. I was having email
> problems
> > > and I’ve only just noticed the unanswered comments. Also, this is not a
> > > direct reply.
> > >
> > > >> chia00: How can we specify the number of partitions and the
> replication factor
> > >  when `errors.deadletterqueue.auto.create.topics.enable` is set to
> true?
> > >
> > > Personally, I prefer to make people create their DLQ topics manually,
> but I take the
> > > point. In order to give full flexibility, the list of configs you need
> is quite long including
> > > min.isr and compression. For consistency with Kafka Connect sink
> connectors, I
> > > could add `errors.deadletterqueue.topic.replication.factor` but that's
> the only
> > > additional config provided by Kafka Connect. Is that worthwhile? I
> suggest not.
> > >
> > > The DLQ topic config in this KIP is broker-level config, while it's
> connector-level
> > > config for Kafka Connect. So, my preference is to just have one
> broker-level config
> > > for auto-creation on/off, and auto-create with the cluster's topic
> defaults. If anything
> > > more specific is required, the administrator can create the DLQ topic
> themselves with
> > > their preferences. Let me know what you think.
> > >
> > > >> chia01: Should the error stack trace be included in the message
> headers,
> > >  similar to what's done in KIP-298?
> > >
> > > In KIP-298, the code deciding to write a message to the DLQ is running
> in the
> > > Kafka Connect task and an exception is readily available. In this KIP,
> the code writing
> > > to the DLQ is running in the broker and it doesn't have any detail
> about why the
> > > record is being DLQed. I think that actually the
> __dlq.errors.exception.*  headers
> > > are not feasible without allowing the application to provide
> additional error context.
> > > That might be helpful one day, but that's extending this KIP more than
> I intend.
> > > I have removed these headers from the KIP.
> > >
> > > >> chia02: Why does `errors.deadletterqueue.copy.record.enable` have
> different
> > > default values at the broker level and group level?
> > >
> > > I want the group administrator to be able to choose whether to copy
> the payloads.
> > > I was also thinking that it would be a good idea if the cluster
> administrator could
> > > prevent this across the cluster, but I've changed my mind and I've
> removed it.
> > >
> > > Maybe a better idea would simply to have a broker config
> > > `group.share.errors.deadletterqueue.enable` to turn DLQ on/off. The
> other
> > > broker configs in this KIP do not start `group.share.` because they're
> intended
> > > for other DLQ uses by the broker in future.
> > >
> > > Note that although share.version=2 is required to enable DLQ, this
> isn't a suitable
> > > long-term switch because we might have share.version > 2 due to
> another future
> > > enhancement.
> > >
> > > >> chia03: Does the broker log an error for every message if the DLQ
> topic fails to be created?
> > >
> > > No, that seems excessive and likely to flood the logs. I would
> implement something like
> > > no more than one log per minute, per share-partition. That would be
> annoying enough to
> > > fix without being catastrophically verbose.
> > >
> > > Of course, if the group config `errors.deadletterqueue.topic.name`
> has a value which
> > > does not satisfy the broker config
> `errors.deadletterqueue.topic.name.prefix`, it will
> > > be considered a config error and the DLQ will not be used.
> > >
> > > >> chia04: Have you consider adding metrics for the DLQ?
> > >
> > > Yes, that is a good idea. I've added some metrics to the KIP. Please
> take a look.
> > >
> > >
> > > Thanks,
> > > Andrew
> > >
> > > > On 4 Aug 2025, at 11:30, Andrew Schofield <
> [email protected]> wrote:
> > > >
> > > > Hi,
> > > > Thanks for your comments on the KIP and sorry for the delay in
> responding.
> > > >
> > > > D01: Authorisation is the area of this KIP that I think is most
> tricky. The reason that I didn't implement specific
> > > > ACLs for DLQs because I was not convinced they would help. So, if
> you have a specific idea in mind, please
> > > > let me know. This is the area that I'm least comfortable with in the
> KIP.
> > > >
> > > > I suppose maybe to set the DLQ name for a group, you could need a
> higher level of authorisation
> > > > than just ALTER_CONFIGS on the GROUP. But what I settled with in the
> KIP was that DLQ topics
> > > > all start with the same prefix, defaulting to "dlq.", and that the
> topics do not automatically create.
> > > >
> > > > D02: I can see that. I've added a config which I've called
> errors.deadletterqueue.auto.create.topics.enable
> > > > just to have a consistent prefix on all of the config names. Let me
> know what you think.
> > > >
> > > > D03: I've added some text about failure scenarios when attempting to
> write records to the DLQ.
> > > >
> > > > Thanks,
> > > > Andrew
> > > > ________________________________________
> > > > From: isding_l <[email protected]>
> > > > Sent: 16 July 2025 04:18
> > > > To: dev <[email protected]>
> > > > Subject: Re: [DISCUSS]: KIP-1191: Dead-letter queues for share groups
> > > >
> > > > Hi Andrew,
> > > > Thanks for the nice KIP, This KIP design for introducing dead-letter
> queues (DLQs) for Share Groups is generally clear and reasonable,
> addressing the key pain points of handling "poison message".
> > > >
> > > >
> > > > D01: Should we consider implementing independent ACL configurations
> for DLQs? This would enable separate management of DLQ topic read/write
> permissions from source topics, preventing privilege escalation attacks via
> "poison message" + DLQ mechanisms.
> > > >
> > > >
> > > > D02: While disabling automatic DLQ topic creation is justifiable for
> security, it creates operational overhead in automated deployments. Can we
> introduce a configuration parameter auto.create.dlq.topics.enable to govern
> this behavior?
> > > >
> > > >
> > > > D03: How should we handle failure scenarios when brokers attempt to
> write records to the DLQ?
> > > > ---- Replied Message ----
> > > > | From | Andrew Schofield<[email protected]> |
> > > > | Date | 07/08/2025 17:54 |
> > > > | To | [email protected]<[email protected]> |
> > > > | Subject | [DISCUSS]: KIP-1191: Dead-letter queues for share groups
> |
> > > > Hi,
> > > > I'd like to start discussion on KIP-1191 which adds dead-letter
> queue support for share groups.
> > > > Records which cannot be processed by consumers in a share group can
> be automatically copied
> > > > onto another topic for a closer look.
> > > >
> > > > KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups
> > > >
> > > > Thanks,
> > > > Andrew
> > >
> > >
>

Reply via email to