Aayush Gupta created KAFKA-20594:
------------------------------------

             Summary: Kafka Streams issue
                 Key: KAFKA-20594
                 URL: https://issues.apache.org/jira/browse/KAFKA-20594
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 3.4.0
            Reporter: Aayush Gupta


Looking for some Kafka Streams expertise on an issue we're investigating.

Problem: A KafkaStreams-based consumer stops polling after ~15 minutes of topic 
inactivity. The adapter stays alive, but no messages are picked up until it's 
manually restarted. Reproduces on both IBM MQ-backed Kafka and Confluent Cloud.

Suspected cause: Back-to-back expiry of 
[connections.max.idle.ms|http://connections.max.idle.ms/] (client, 9 min 
default) + broker idle timeout (~10 min). Stream thread dies on the next poll 
attempt, KafkaStreams goes to ERROR state silently — no StateListener or 
UncaughtExceptionHandler was registered, so nothing recovers.

Questions:

Is this a known pattern with KafkaStreams on idle topics? Any recommended 
approach?
Is the close() + re-instantiate pattern safe? Any rebalance/duplicate risks?
For Kafka 2.8+, should we prefer StreamsUncaughtExceptionHandler with 
REPLACE_THREAD instead of a full restart?
Any input appreciated!



Not observing any error stacks or exceptions in the logs when the issue occurs.
As part of our investigation, we wanted to check if there are any JVM flags or 
framework-level configurations that can be enabled to extract more detailed 
Kafka framework debug logs, particularly around Kafka Streams and consumer 
lifecycle behavior.


Given the absence of exceptions or diagnostic logs, it is unclear whether 
further tuning of Kafka consumer properties alone would meaningfully alleviate 
the issue without better visibility into the Kafka Streams internals.
>From your experience, do you have any recommendations beyond enabling more 
>detailed Kafka/Streams logging—such as known patterns, specific stream-thread 
>behaviors, or client-side recovery considerations—that we should be exploring 
>in parallel?
Any guidance would be greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to