[ https://issues.apache.org/jira/browse/CASSANDRA-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitry Konstantinov updated CASSANDRA-20250: -------------------------------------------- Status: Patch Available (was: In Progress) > Optimize Counter, Meter and Histogram metrics using thread local counters > ------------------------------------------------------------------------- > > Key: CASSANDRA-20250 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20250 > Project: Apache Cassandra > Issue Type: New Feature > Components: Observability/Metrics > Reporter: Dmitry Konstantinov > Assignee: Dmitry Konstantinov > Priority: Normal > Fix For: 5.x > > Attachments: 5.1_profile_cpu.html, > 5.1_profile_cpu_without_metrics.html, 5.1_tl4_profile_cpu.html, > Histogram_AtomicLong.png, async_profiler_cpu_profiles.zip, > cas_reverse_graph_metrics.png, cpu_profile_insert.html, > image-2025-02-18-23-22-19-983.png, jmh-result.json, vmstat.log, > vmstat_without_metrics.log > > Time Spent: 1h > Remaining Estimate: 0h > > Cassandra has a lot of metrics collected, many of them are collected per > table, so their instance number is multiplied by number of tables. From one > side it gives a better observability, from another side metrics are not for > free, there is an overhead associated with them: > 1) CPU overhead: in case of simple CPU bound load: I already see like 5.5% of > total CPU spent for metrics in cpu framegraphs for read load and 11% for > write load. > Example: [^cpu_profile_insert.html] (search by "codahale" pattern). The > framegraph is captured using Async profiler build: > async-profiler-3.0-29ee888-linux-x64 > 2) memory overhead: we spend memory for entities used to aggregate metrics > such as LongAdders and reservoirs + for MBeans (String concatenation within > object names is a major cause of it, for each table+metric name combination a > new String is created) > LongAdder is used by Dropwizard Counter/Meter and Histogram metrics for > counting purposes. It has severe memory overhead + while has a better scaling > than AtomicLong we still have to pay some cost for the concurrent operations. > Additionally, in case of Meter - we have a non-optimal behaviour when we > count the same things several times. > The idea (suggested by [~benedict]) is to switch to thread-local counters > which we can store in a common thread-local array to reduce memory overhead. > In this way we can avoid concurrent update overheads/contentions and to > reduce memory footprint as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org