DLQ Flow Visualization 1. Record receives REJECT ack or reaches maxDeliveryCount 2. Record state transitions: ACQUIRED -> ARCHIVING (3) 3. SharePartition calls DLQHandler.enqueue(pendingRecord) - DLQHandler has: bounded queue (10K), dedicated thread pool, idempotent producer 4. DLQHandler writes record to DLQ topic asynchronously 5. On callback (success/failure): Record state ARCHIVING -> ARCHIVED (4)
SharePartition -> ARCHIVING(3)-> DLQHandler.enqueue() -> [async write] -> callback -> ARCHIVED(4) ---- The overall KIP looks solid for the initial design, but let's brainstorm some scenarios and design choices: - DLQ writes (may) block coordinator → heartbeat delays → consumer timeouts - Decouple DLQ write and handle slowness, retry logic - Unlimited DLQ backlog → OutOfMemoryError - Network retry causes duplicate records in DLQ few components during implementation, that can help : ShareGroupDLQHandler => Main handler with bounded queue, writer pool, idempotent producer - Bounded queue: Non-blocking enqueue(), returns immediately - Bounded queue: Fixed capacity (10K), rejects when full - Writer pool: Isolated threads, coordinator threads never wait - Idempotent producer: Broker deduplicates by (PID, sequence) DLQWriterTask => Worker thread that polls and writes with retry - Callback: Invoked when write completes → triggers ARCHIVING → ARCHIVED - Background poll: Writer thread does the waiting, not request thread and handles the retry attempts/ DLQCycleDetector => Prevents DLQ to DLQ loops Regards,Shekharrajakhttps://github.com/shekharrajak On Tuesday 8 July 2025 at 03:26:40 pm GMT+5:30, Andrew Schofield <[email protected]> wrote: Hi, I'd like to start discussion on KIP-1191 which adds dead-letter queue support for share groups. Records which cannot be processed by consumers in a share group can be automatically copied onto another topic for a closer look. KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups Thanks, Andrew
