Hi Andrew, a few comments/questions from me: FV00: The KIP says "copying of the original record data into the DLQ is controlled by two configurations", but I only see the client side configuration in the latest revision.
FV01: The KIP says: "When an undeliverable record transitions to the Archived state for such a group, a record is written onto the DLQ topic". Later on it mentions a new "Archiving" state. Can you clarify the state transition when sending a record to a DLQ? FV02: Is the new state required to ensure that the DLQ record is eventually written in case of the Share Coordinator failover? Thanks, Fede On Tue, Dec 2, 2025 at 7:19 PM Andrew Schofield <[email protected]> wrote: > > Hi, > I'd like to bump this discussion thread for adding DLQs to share groups. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups > > Thanks, > Andrew > > On 2025/10/16 19:02:48 Andrew Schofield wrote: > > Hi Chia-Ping, > > Apologies for not responding to your comments. I was having email problems > > and I’ve only just noticed the unanswered comments. Also, this is not a > > direct reply. > > > > >> chia00: How can we specify the number of partitions and the replication > > >> factor > > when `errors.deadletterqueue.auto.create.topics.enable` is set to true? > > > > Personally, I prefer to make people create their DLQ topics manually, but I > > take the > > point. In order to give full flexibility, the list of configs you need is > > quite long including > > min.isr and compression. For consistency with Kafka Connect sink > > connectors, I > > could add `errors.deadletterqueue.topic.replication.factor` but that's the > > only > > additional config provided by Kafka Connect. Is that worthwhile? I suggest > > not. > > > > The DLQ topic config in this KIP is broker-level config, while it's > > connector-level > > config for Kafka Connect. So, my preference is to just have one > > broker-level config > > for auto-creation on/off, and auto-create with the cluster's topic > > defaults. If anything > > more specific is required, the administrator can create the DLQ topic > > themselves with > > their preferences. Let me know what you think. > > > > >> chia01: Should the error stack trace be included in the message headers, > > similar to what's done in KIP-298? > > > > In KIP-298, the code deciding to write a message to the DLQ is running in > > the > > Kafka Connect task and an exception is readily available. In this KIP, the > > code writing > > to the DLQ is running in the broker and it doesn't have any detail about > > why the > > record is being DLQed. I think that actually the __dlq.errors.exception.* > > headers > > are not feasible without allowing the application to provide additional > > error context. > > That might be helpful one day, but that's extending this KIP more than I > > intend. > > I have removed these headers from the KIP. > > > > >> chia02: Why does `errors.deadletterqueue.copy.record.enable` have > > >> different > > default values at the broker level and group level? > > > > I want the group administrator to be able to choose whether to copy the > > payloads. > > I was also thinking that it would be a good idea if the cluster > > administrator could > > prevent this across the cluster, but I've changed my mind and I've removed > > it. > > > > Maybe a better idea would simply to have a broker config > > `group.share.errors.deadletterqueue.enable` to turn DLQ on/off. The other > > broker configs in this KIP do not start `group.share.` because they're > > intended > > for other DLQ uses by the broker in future. > > > > Note that although share.version=2 is required to enable DLQ, this isn't a > > suitable > > long-term switch because we might have share.version > 2 due to another > > future > > enhancement. > > > > >> chia03: Does the broker log an error for every message if the DLQ topic > > >> fails to be created? > > > > No, that seems excessive and likely to flood the logs. I would implement > > something like > > no more than one log per minute, per share-partition. That would be > > annoying enough to > > fix without being catastrophically verbose. > > > > Of course, if the group config `errors.deadletterqueue.topic.name` has a > > value which > > does not satisfy the broker config > > `errors.deadletterqueue.topic.name.prefix`, it will > > be considered a config error and the DLQ will not be used. > > > > >> chia04: Have you consider adding metrics for the DLQ? > > > > Yes, that is a good idea. I've added some metrics to the KIP. Please take a > > look. > > > > > > Thanks, > > Andrew > > > > > On 4 Aug 2025, at 11:30, Andrew Schofield > > > <[email protected]> wrote: > > > > > > Hi, > > > Thanks for your comments on the KIP and sorry for the delay in responding. > > > > > > D01: Authorisation is the area of this KIP that I think is most tricky. > > > The reason that I didn't implement specific > > > ACLs for DLQs because I was not convinced they would help. So, if you > > > have a specific idea in mind, please > > > let me know. This is the area that I'm least comfortable with in the KIP. > > > > > > I suppose maybe to set the DLQ name for a group, you could need a higher > > > level of authorisation > > > than just ALTER_CONFIGS on the GROUP. But what I settled with in the KIP > > > was that DLQ topics > > > all start with the same prefix, defaulting to "dlq.", and that the topics > > > do not automatically create. > > > > > > D02: I can see that. I've added a config which I've called > > > errors.deadletterqueue.auto.create.topics.enable > > > just to have a consistent prefix on all of the config names. Let me know > > > what you think. > > > > > > D03: I've added some text about failure scenarios when attempting to > > > write records to the DLQ. > > > > > > Thanks, > > > Andrew > > > ________________________________________ > > > From: isding_l <[email protected]> > > > Sent: 16 July 2025 04:18 > > > To: dev <[email protected]> > > > Subject: Re: [DISCUSS]: KIP-1191: Dead-letter queues for share groups > > > > > > Hi Andrew, > > > Thanks for the nice KIP, This KIP design for introducing dead-letter > > > queues (DLQs) for Share Groups is generally clear and reasonable, > > > addressing the key pain points of handling "poison message". > > > > > > > > > D01: Should we consider implementing independent ACL configurations for > > > DLQs? This would enable separate management of DLQ topic read/write > > > permissions from source topics, preventing privilege escalation attacks > > > via "poison message" + DLQ mechanisms. > > > > > > > > > D02: While disabling automatic DLQ topic creation is justifiable for > > > security, it creates operational overhead in automated deployments. Can > > > we introduce a configuration parameter auto.create.dlq.topics.enable to > > > govern this behavior? > > > > > > > > > D03: How should we handle failure scenarios when brokers attempt to write > > > records to the DLQ? > > > ---- Replied Message ---- > > > | From | Andrew Schofield<[email protected]> | > > > | Date | 07/08/2025 17:54 | > > > | To | [email protected]<[email protected]> | > > > | Subject | [DISCUSS]: KIP-1191: Dead-letter queues for share groups | > > > Hi, > > > I'd like to start discussion on KIP-1191 which adds dead-letter queue > > > support for share groups. > > > Records which cannot be processed by consumers in a share group can be > > > automatically copied > > > onto another topic for a closer look. > > > > > > KIP: > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups > > > > > > Thanks, > > > Andrew > > > >
