Lucas Brutschy created KAFKA-19529: -------------------------------------- Summary: State updater does not always seem to remove all metrics Key: KAFKA-19529 URL: https://issues.apache.org/jira/browse/KAFKA-19529 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 4.2.0 Reporter: Lucas Brutschy
In long-running tests, streams sometimes fails with this exception: {code:java} /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz:[2025-06-27 02:06:22,384] ERROR [i-0285945bbb1aa2330-StreamThread-3] stream-client [i-0285945bbb1aa2330] Encountered the following exception during processing and sent shutdown request for the entire application. (org.apache.kafka.streams.KafkaStreams) /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-java.lang.IllegalArgumentException: A metric named 'MetricName [name=restore-records-rate, group=stream-state-updater-metrics, description=The average per-second number of records restored, tags={thread-id=i-0285945bbb1aa2330-StateUpdater-3}]' already exists, can't register another one. /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:344) /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:317) /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at org.apache.kafka.streams.processor.internals.DefaultStateUpdater$StateUpdaterMetrics.<init>(DefaultStateUpdater.java:1121) /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at org.apache.kafka.streams.processor.internals.DefaultStateUpdater$StateUpdaterThread.<init>(DefaultStateUpdater.java:96) /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at org.apache.kafka.streams.processor.internals.DefaultStateUpdater.start(DefaultStateUpdater.java:794) /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at org.apache.kafka.streams.processor.internals.TaskManager.init(TaskManager.java:154) /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:885) {code} * This seems to happen, when we restart a stream thread, e.g. the stream thread is removed at some point and then restarted at a later point. * We see log lines before / after the code that removes metrics from a state updater thread * Every time, it is a different state updater metric that seems to crash, so it seems to be some kind of race condition * I found a recent change in state updater initialization: [https://github.com/apache/kafka/pull/19889] . I have digged into it, and also asked Matthias/Bruno to dig into it, but we couldn’t find a reason why this could cause any problem. We have reverted PR #19889 in 4.1 and this seems to have resolved the problem there. The problem still persists on trunk. We should either revert #19889 on trunk as well, or fix the PR. cc [~cadonna] [~bbejeck] -- This message was sent by Atlassian Jira (v8.20.10#820010)