[jira] [Commented] (HDDS-15552) Ratis events should not be published as metrics

Roland Elek (Jira) Tue, 16 Jun 2026 07:46:00 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089396#comment-18089396
 ]


Roland Elek commented on HDDS-15552:
------------------------------------

Besides the syntax implications, Prometheus creates a new time series for each 
set of label values. This is important, as its resource footprint primarily 
scales with the number of active time series. This makes a label value that 
changes frequently over an infinite domain (like a timestamp, a container ID, 
or a list of recent state machine events as arbitrary strings with or without 
these fields) very expensive.

With 3 OMs, 3 SCMs, 15 days of retention, and a 15-second scrape interval, we 
get about 500k time series - manageable on their own, but decidedly significant 
for a single Prometheus instance.

> Ratis events should not be published as metrics
> -----------------------------------------------
>
>                 Key: HDDS-15552
>                 URL: https://issues.apache.org/jira/browse/HDDS-15552
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Ethan Rose
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>              Labels: pull-request-available
>
> HDDS-13133 started tracking Ratis events (arbitrary strings) as metrics. 
> These then get exposed over JMX and Prometheus. This completely prevents 
> Prometheus from scraping these endpoints because it fails when any of the 
> messages have invalid characters like " or \n. We can keep the list of events 
> in memory and maintain the web UI functionality without exposing it as a 
> metric.
> Additionally to verify this change, we should add an acceptance test call to 
> {{GET http://<prometheus-host>:9090/api/v1/targets}} and ensure that 
> {{health=up}} for each component to prevent future regressions like this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-15552) Ratis events should not be published as metrics

Reply via email to