[
https://issues.apache.org/jira/browse/KAFKA-17040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904232#comment-17904232
]
Lianet Magrans edited comment on KAFKA-17040 at 12/9/24 6:02 PM:
-----------------------------------------------------------------
Hey [~apoorvmittal10] , in case it helps, I believe this issue happens when the
consumer close cannot wait for the network thread to close (ex. close with low
timeout or interrupted). This flow:
# async consumer app thread triggers action to close network thread, and block
until it completes (won't wait if interrupted or low timeout)
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1323]
# async consumer app thread moves on and closes the telemetry reporter
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1335]
# background thread still running the network thread close, makes it to the
point where it polls the client to send the unsent requests it has before
closing
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java#L309]
with that sequence, we would end up trying to update the telemetry reporter
that is already TERMINATED I expect, so this line would throw when we poll the
network client:
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L643]
Makes sense? I was taking a look at a flaky test we have with interrupt and saw
this error a lot, it may help here:
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L834]
was (Author: JIRAUSER300183):
Hey [~apoorvmittal10] , in case it helps, I believe this issue happens when the
consumer close cannot wait for the network thread to close (ex. close with low
timeout or interrupted). This flow:
# async consumer app thread triggers action to close network thread, and block
until it completes (won't wait if interrupted or low timeout)
https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1323
# async consumer app thread moves on and closes the telemetry reporter
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1335]
# background thread still running the network thread close, makes it to the
point where it polls the client to send the unsent requests it has before
closing
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java#L309]
with that sequence, we would end up trying to update the telemetry reporter
that is already TERMINATED I expect. Makes sense? I was taking a look at a
flaky test we have with interrupt and saw this error a lot, it may help here:
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L834]
> Unknown telemetry state: TERMINATED thrown when closing AsyncKafkaConsumer
> --------------------------------------------------------------------------
>
> Key: KAFKA-17040
> URL: https://issues.apache.org/jira/browse/KAFKA-17040
> Project: Kafka
> Issue Type: Bug
> Components: clients, metrics
> Affects Versions: 3.9.0
> Reporter: Kirk True
> Assignee: Apoorv Mittal
> Priority: Major
>
> An error is occasionally thrown when closing the {{{}AsyncKafkaConsumer{}}}:
> {noformat}
> [ERROR] 2024-06-20 17:13:54,121 [consumer_background_thread]
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread
> lambda$configureThread$0 - Uncaught exception in thread
> 'consumer_background_thread':
> java.lang.IllegalStateException: Unknown telemetry state: TERMINATED
> at
> org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter$DefaultClientTelemetrySender.timeToNextUpdate(ClientTelemetryReporter.java:363)
> at
> org.apache.kafka.clients.NetworkClient$TelemetrySender.maybeUpdate(NetworkClient.java:1392)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:668)
> at
> org.apache.kafka.clients.consumer.internals.NetworkClientDelegate.poll(NetworkClientDelegate.java:143)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.sendUnsentRequests(ConsumerNetworkThread.java:299)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.cleanup(ConsumerNetworkThread.java:318)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.run(ConsumerNetworkThread.java:105){noformat}
> The issue appears to be that the {{TERMINATED}} state is not expected in the
> switch statement inside
> [timeToNextUpdate()|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/telemetry/internals/ClientTelemetryReporter.java#L307].
> As an aside, the error message might make more sense to be written as
> "{_}Unexpected{_} telemetry state" instead of "{_}Unknown{_} telemetry state"
> since {{TERMINATED}} is a known state, but heretofore unexpected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)