[ https://issues.apache.org/jira/browse/KAFKA-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196289#comment-17196289 ]
Guozhang Wang commented on KAFKA-10484: --------------------------------------- Thanks Bruno. I agree that providing some higher-level aggregates out of the box could be one option, but there are thoughts before of whether Streams should be providing those v.s. only providing most fine-grained ones and let users do the aggregation at their app. Another thought for practically operating through this limit is to have an alert on the "num. of metrics" metric (each metrics registry would provide this special metric), if it is near the limit or if it is not existed probably due to the fact that it is already truncated due to limit violations. > Reduce Metrics Exposed by Streams > --------------------------------- > > Key: KAFKA-10484 > URL: https://issues.apache.org/jira/browse/KAFKA-10484 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 2.6.0 > Reporter: Bruno Cadonna > Priority: Major > > In our test cluster metrics are monitored through a monitoring service. We > experienced a couple of times that a Kafka Streams client exceeded the limit > of 350 metrics of the monitoring service. When the client exceeds the limit, > metrics will be truncated which might result in false alerts. For example, in > our cluster, we monitor the alive stream threads and trigger an alert if a > stream thread dies. It happened that when the client exceeded the 350 metrics > limit, the alive stream threads metric was truncated which led to a false > alarm. > The main driver of the high number of metrics are the metrics on task level > and below. An example for those metrics are the state store metrics. The > number of such metrics per Kafka Streams client is hard to predict since it > depends on which tasks are assigned to the client. A stateful task with 5 > state stores reports 5 times more state store metrics than a stateful with > only one state store. Sometimes it is possible to only report the metrics of > some state stores. But sometimes this is not an option. For example, if we > want to monitor the memory usage of RocksDB per Kafka Streams client, we need > to report the memory related metrics of all RocksDB state stores of all tasks > assigned to all stream threads of one client. > One option to reduce the reported metrics is to add a metric that aggregates > some state store metrics, e.g., to monitor memory usage, on client-level > within Kafka Streams. -- This message was sent by Atlassian Jira (v8.3.4#803005)