[
https://issues.apache.org/jira/browse/KAFKA-19678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18031529#comment-18031529
]
Steven Schlansker commented on KAFKA-19678:
-------------------------------------------
[~mjsax] , would it make sense to move the state store oldest open iterator
metric from current INFO to DEBUG level? That would resolve the issue as far as
we are concerned, we are happy to accept this kind of overhead when debugging
(now that the leak is fixed).
> Streams open iterator tracking has high contention on metrics lock
> ------------------------------------------------------------------
>
> Key: KAFKA-19678
> URL: https://issues.apache.org/jira/browse/KAFKA-19678
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 4.1.0
> Reporter: Steven Schlansker
> Priority: Major
> Attachments: image-2025-09-05-12-13-24-910.png,
> image-2025-10-20-13-36-54-857.png, image-2025-10-21-09-24-02-505.png
>
>
> We run Kafka Streams 4.1.0 with custom processors that heavily use state
> store range iterators.
> While investigating disappointing performance, we found a surprising source
> of lock contention.
> Over the course of about a 1 minute profiler sample, the
> {{org.apache.kafka.common.metrics.Metrics}} lock is taken approximately
> 40,000 times and blocks threads for about 1 minute.
> This appears to be because our state stores generally have no iterators open,
> except when their processor is processing a record, in which case it opens an
> iterator (taking the lock through {{OpenIterators.add}} into
> {{{}Metrics.registerMetric{}}}), does a tiny bit of work, and then closes the
> iterator (again taking the lock through {{OpenIterators.remove}} into
> {{{}Metrics.removeMetric{}}}).
> So, stream processing threads takes a globally shared lock twice per record,
> for this subset of our data. I've attached a profiler thread state
> visualization with our findings - the red bar indicates the thread was
> blocked during the sample on this lock. As you can see, this lock seems to be
> severely hampering our performance.
>
> !image-2025-09-05-12-13-24-910.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)