[jira] [Commented] (CASSANDRA-20250) Provide the ability to disable specific metrics collection

Dmitry Konstantinov (Jira) Fri, 14 Feb 2025 08:58:10 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927128#comment-17927128
 ]


Dmitry Konstantinov commented on CASSANDRA-20250:
-------------------------------------------------

{quote}I will have further suggestions once I have time to properly review it
{quote}
Yes, of course, it is clear
{quote}I think when it comes to reclaiming ids, we should keep it simple. If we 
have cleaned up all of the old metrics due to the threads all referencing it 
being dead, we can immediately re-use the id. I would consider maintaining a 
{{BitSet}} of ids where we know this is true, that is maintained in a 
synchronised block.
{quote}
Probably we speak about different things.. I am about a case when we create and 
release metrics dynamically like when we create a table we create a set of 
metrics and when we drop a table - we release them. The threads which keep 
ThreadLocalMetrics are usually alive in this use case. To avoid memory leaking 
I introduced an explicit release/destroy logic when a metric is unregistered 
from CassandraMetricsRegistry to recycle array positions to a 
ConcurrentSkipListMap with free ids as a kind of min-heap 
([here|https://github.com/apache/cassandra/compare/trunk...netudima:cassandra:20250_proto-trunk#diff-a50bb0b94821022a7c4e8e63a13615bed7155fd17658235489e84f635eef4784R51])

Regarding PhantomReferences, am I right that you suggest to switch to them for 
ThreadLocalMetrics objects recycling (when an instance of ThreadLocalMetrics is 
associated with a dead thread) instead of a scheduled job 
[here|https://github.com/apache/cassandra/compare/trunk...netudima:cassandra:20250_proto-trunk#diff-a50bb0b94821022a7c4e8e63a13615bed7155fd17658235489e84f635eef4784R43]?

> Provide the ability to disable specific metrics collection
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-20250
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20250
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Observability/Metrics
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>         Attachments: 5.1_profile_cpu.html, 
> 5.1_profile_cpu_without_metrics.html, 5.1_tl4_profile_cpu.html, 
> Histogram_AtomicLong.png, async_profiler_cpu_profiles.zip, 
> cpu_profile_insert.html, jmh-result.json, vmstat.log, 
> vmstat_without_metrics.log
>
>
> Cassandra has a lot of metrics collected, many of them are collected per 
> table, so their instance number is multiplied by number of tables. From one 
> side it gives a better observability, from another side metrics are not for 
> free, there is an overhead associated with them:
> 1) CPU overhead: in case of simple CPU bound load: I already see like 5.5% of 
> total CPU spent for metrics in cpu framegraphs for read load and 11% for 
> write load. 
> Example: [^cpu_profile_insert.html] (search by "codahale" pattern). The 
> framegraph is captured using Async profiler build: 
> async-profiler-3.0-29ee888-linux-x64
> 2) memory overhead: we spend memory for entities used to aggregate metrics 
> such as LongAdders and reservoirs + for MBeans (String concatenation within 
> object names is a major cause of it, for each table+metric name combination a 
> new String is created)
>  
> The idea of this ticket is to allow an operator to configure a list of 
> disabled metrics in cassandra.yaml, like:
> {code:java}
> disabled_metrics:
>     - metric_a
>     - metric_b
> {code}
> From implementation point of view I see two possible approaches (which can be 
> combined):
>  # Generic: when a metric is registering if it is listed in disabled_metrics 
> we do not publish it via JMX and provide a noop implementation of metric 
> object (such as histogram) for it.
> Logging analogy: log level check within log method
>  # Specialized: for some metrics the process of value calculation is not for 
> free and introduces an overhead as well, in such cases it would be useful to 
> check within specific logic using an API (like: isMetricEnabled) do we need 
> to do it. Example of such metric: 
> ClientRequestSizeMetrics.recordRowAndColumnCountMetrics
> Logging analogy: an explicit 'if (isDebugEnabled())' condition used when a 
> message parameter is expensive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-20250) Provide the ability to disable specific metrics collection

Reply via email to