[
https://issues.apache.org/jira/browse/KAFKA-20594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086758#comment-18086758
]
Matthias J. Sax commented on KAFKA-20594:
-----------------------------------------
Possible. Hard to say, but 3.4.0 is getting old... If nobody can reproduce this
issue on a newer version, and given that K17299 is fixed with 3.7.2 and newer,
it could be reasonable to assume that it's the same issue and it was fixed, and
we could close this ticket.
> Kafka Streams issue
> -------------------
>
> Key: KAFKA-20594
> URL: https://issues.apache.org/jira/browse/KAFKA-20594
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 3.4.0
> Reporter: Aayush Gupta
> Priority: Blocker
>
> Looking for some Kafka Streams expertise on an issue we're investigating.
> Problem: A KafkaStreams-based consumer stops polling after ~15 minutes of
> topic inactivity. The adapter stays alive, but no messages are picked up
> until it's manually restarted. Reproduces on both IBM MQ-backed Kafka and
> Confluent Cloud.
> Suspected cause: Back-to-back expiry of
> [connections.max.idle.ms|http://connections.max.idle.ms/] (client, 9 min
> default) + broker idle timeout (~10 min). Stream thread dies on the next poll
> attempt, KafkaStreams goes to ERROR state silently — no StateListener or
> UncaughtExceptionHandler was registered, so nothing recovers.
> Questions:
> Is this a known pattern with KafkaStreams on idle topics? Any recommended
> approach?
> Is the close() + re-instantiate pattern safe? Any rebalance/duplicate risks?
> For Kafka 2.8+, should we prefer StreamsUncaughtExceptionHandler with
> REPLACE_THREAD instead of a full restart?
> Any input appreciated!
> Not observing any error stacks or exceptions in the logs when the issue
> occurs.
> As part of our investigation, we wanted to check if there are any JVM flags
> or framework-level configurations that can be enabled to extract more
> detailed Kafka framework debug logs, particularly around Kafka Streams and
> consumer lifecycle behavior.
> Given the absence of exceptions or diagnostic logs, it is unclear whether
> further tuning of Kafka consumer properties alone would meaningfully
> alleviate the issue without better visibility into the Kafka Streams
> internals.
> From your experience, do you have any recommendations beyond enabling more
> detailed Kafka/Streams logging—such as known patterns, specific stream-thread
> behaviors, or client-side recovery considerations—that we should be exploring
> in parallel?
> Any guidance would be greatly appreciated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)