Aayush Gupta created KAFKA-20594:
------------------------------------
Summary: Kafka Streams issue
Key: KAFKA-20594
URL: https://issues.apache.org/jira/browse/KAFKA-20594
Project: Kafka
Issue Type: Bug
Affects Versions: 3.4.0
Reporter: Aayush Gupta
Looking for some Kafka Streams expertise on an issue we're investigating.
Problem: A KafkaStreams-based consumer stops polling after ~15 minutes of topic
inactivity. The adapter stays alive, but no messages are picked up until it's
manually restarted. Reproduces on both IBM MQ-backed Kafka and Confluent Cloud.
Suspected cause: Back-to-back expiry of
[connections.max.idle.ms|http://connections.max.idle.ms/] (client, 9 min
default) + broker idle timeout (~10 min). Stream thread dies on the next poll
attempt, KafkaStreams goes to ERROR state silently — no StateListener or
UncaughtExceptionHandler was registered, so nothing recovers.
Questions:
Is this a known pattern with KafkaStreams on idle topics? Any recommended
approach?
Is the close() + re-instantiate pattern safe? Any rebalance/duplicate risks?
For Kafka 2.8+, should we prefer StreamsUncaughtExceptionHandler with
REPLACE_THREAD instead of a full restart?
Any input appreciated!
Not observing any error stacks or exceptions in the logs when the issue occurs.
As part of our investigation, we wanted to check if there are any JVM flags or
framework-level configurations that can be enabled to extract more detailed
Kafka framework debug logs, particularly around Kafka Streams and consumer
lifecycle behavior.
Given the absence of exceptions or diagnostic logs, it is unclear whether
further tuning of Kafka consumer properties alone would meaningfully alleviate
the issue without better visibility into the Kafka Streams internals.
>From your experience, do you have any recommendations beyond enabling more
>detailed Kafka/Streams logging—such as known patterns, specific stream-thread
>behaviors, or client-side recovery considerations—that we should be exploring
>in parallel?
Any guidance would be greatly appreciated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)