[jira] [Comment Edited] (CASSANDRA-20250) Optimize Counter, Meter and Histogram metrics using thread local counters

Dmitry Konstantinov (Jira) Thu, 13 Nov 2025 15:09:07 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023420#comment-18023420
 ]


Dmitry Konstantinov edited comment on CASSANDRA-20250 at 11/13/25 11:08 PM:
----------------------------------------------------------------------------

there are the following things for follow-up tickets:
 * to measure overheads for reads and evaluate a batch snapshotting option to 
reduce read costs
 * try StampedLock instead of ReadWriteLock
 * extending FastThreadLocalThread and making ThreadLocalMetrics instance an 
instance variable - CASSANDRA-21020
 * explore potential options to reduce memory usage such as using of int to int 
maps or other more compact representations
 * more reliable recycling logic
 * apply the thread local timer logic to 
org.apache.cassandra.metrics.LatencyMetrics.LatencyMetricsTimer


was (Author: dnk):
there are the following things for follow-up tickets:
* to measure overheads for reads and evaluate a batch snapshotting option to 
reduce read costs
* try StampedLock instead of ReadWriteLock
* extending FastThreadLocalThread and making ThreadLocalMetrics instance an 
instance variable
* explore potential options to reduce memory usage such as using of int to int 
maps or other more compact representations
* more reliable recycling logic
* apply the thread local timer logic to 
org.apache.cassandra.metrics.LatencyMetrics.LatencyMetricsTimer


> Optimize Counter, Meter and Histogram metrics using thread local counters
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20250
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20250
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Observability/Metrics
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.1
>
>         Attachments: 5.1_profile_cpu.html, 
> 5.1_profile_cpu_without_metrics.html, 5.1_tl4_profile_cpu.html, 
> CASSANDRA-20250_ci_summary.html, CASSANDRA-20250_results_details.tar.xz, 
> Histogram_AtomicLong.png, async_profiler_cpu_profiles.zip, 
> cas_reverse_graph_metrics.png, cpu_profile_insert.html, 
> image-2025-02-18-23-22-19-983.png, jmh-result.json, vmstat.log, 
> vmstat_without_metrics.log
>
>          Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Cassandra has a lot of metrics collected, many of them are collected per 
> table, so their instance number is multiplied by number of tables. From one 
> side it gives a better observability, from another side metrics are not for 
> free, there is an overhead associated with them:
> 1) CPU overhead: in case of simple CPU bound load: I already see like 5.5% of 
> total CPU spent for metrics in cpu framegraphs for read load and 11% for 
> write load. 
> Example: [^cpu_profile_insert.html] (search by "codahale" pattern). The 
> framegraph is captured using Async profiler build: 
> async-profiler-3.0-29ee888-linux-x64
> 2) memory overhead: we spend memory for entities used to aggregate metrics 
> such as LongAdders and reservoirs + for MBeans (String concatenation within 
> object names is a major cause of it, for each table+metric name combination a 
> new String is created)
> LongAdder is used by Dropwizard Counter/Meter and Histogram metrics for 
> counting purposes. It has severe memory overhead + while has a better scaling 
> than AtomicLong we still have to pay some cost for the concurrent operations. 
> Additionally, in case of Meter - we have a non-optimal behaviour when we 
> count the same things several times.
> The idea (suggested by [~benedict]) is to switch to thread-local counters 
> which we can store in a common thread-local array to reduce memory overhead. 
> In this way we can avoid concurrent update overheads/contentions and to 
> reduce memory footprint as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20250) Optimize Counter, Meter and Histogram metrics using thread local counters

Reply via email to