[ https://issues.apache.org/jira/browse/CASSANDRA-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927122#comment-17927122 ]
Dmitry Konstantinov edited comment on CASSANDRA-20250 at 2/14/25 1:29 PM: -------------------------------------------------------------------------- As usual, thank you for the additional ideas! So, I think the next steps for me are the following: * create a separate ticket regarding the histogram improvements * update the current ticket description to reflect the actual changes we discussed in the comments and create a separate ticket regarding disabling of metrics * try to move average to another non-thread local array to improve fetching/caching during the bulk update * Finalize metrics id release logic when metrics are unregistered from the registry - I worry about the cases when we return a metric id back but some concurrent JMX calls may touch it, probably I will introduce some cool down period for a metric id before reusing. * add javadocs to the added classes Do you see any other major things to do for this ticket before switching to a review phase? was (Author: dnk): As usual, thank you for the additional ideas! So, I think the next steps for me are the following: * create a separate ticket regarding the histogram improvements * update the current ticket description to reflect the actual changes we discussed in the comment and create a separate ticket regarding disabling of metrics * try to move average to another non-thread local array to improve fetching/caching during the bulk update * Finalize metrics id release logic when metrics are unregistered from the registry - I worry about the cases when we return a metric id back but some concurrent JMX calls may touch it, probably I will introduce some cool down period for a metric id before reusing. * add javadocs to the added classes Do you see any other major things to do for this ticket before switching to a review phase? > Provide the ability to disable specific metrics collection > ---------------------------------------------------------- > > Key: CASSANDRA-20250 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20250 > Project: Apache Cassandra > Issue Type: New Feature > Components: Observability/Metrics > Reporter: Dmitry Konstantinov > Assignee: Dmitry Konstantinov > Priority: Normal > Attachments: 5.1_profile_cpu.html, > 5.1_profile_cpu_without_metrics.html, 5.1_tl4_profile_cpu.html, > Histogram_AtomicLong.png, async_profiler_cpu_profiles.zip, > cpu_profile_insert.html, jmh-result.json, vmstat.log, > vmstat_without_metrics.log > > > Cassandra has a lot of metrics collected, many of them are collected per > table, so their instance number is multiplied by number of tables. From one > side it gives a better observability, from another side metrics are not for > free, there is an overhead associated with them: > 1) CPU overhead: in case of simple CPU bound load: I already see like 5.5% of > total CPU spent for metrics in cpu framegraphs for read load and 11% for > write load. > Example: [^cpu_profile_insert.html] (search by "codahale" pattern). The > framegraph is captured using Async profiler build: > async-profiler-3.0-29ee888-linux-x64 > 2) memory overhead: we spend memory for entities used to aggregate metrics > such as LongAdders and reservoirs + for MBeans (String concatenation within > object names is a major cause of it, for each table+metric name combination a > new String is created) > > The idea of this ticket is to allow an operator to configure a list of > disabled metrics in cassandra.yaml, like: > {code:java} > disabled_metrics: > - metric_a > - metric_b > {code} > From implementation point of view I see two possible approaches (which can be > combined): > # Generic: when a metric is registering if it is listed in disabled_metrics > we do not publish it via JMX and provide a noop implementation of metric > object (such as histogram) for it. > Logging analogy: log level check within log method > # Specialized: for some metrics the process of value calculation is not for > free and introduces an overhead as well, in such cases it would be useful to > check within specific logic using an API (like: isMetricEnabled) do we need > to do it. Example of such metric: > ClientRequestSizeMetrics.recordRowAndColumnCountMetrics > Logging analogy: an explicit 'if (isDebugEnabled())' condition used when a > message parameter is expensive. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org