[
https://issues.apache.org/jira/browse/CASSANDRA-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Jirsa reassigned CASSANDRA-14281:
--------------------------------------
Assignee: Michael Burman
> LatencyMetrics performance
> --------------------------
>
> Key: CASSANDRA-14281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14281
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Michael Burman
> Assignee: Michael Burman
> Priority: Major
>
> Currently for each write/read/rangequery/CAS touching the CFS we write a
> latency metric which takes a lot of processing time (up to 66% of the total
> processing time if the update was empty).
> The way latencies are recorded is to use both a dropwizard "Timer" as well as
> "Counter". Latter is used for totalLatency and the previous is decaying
> metric for rates and certain percentile metrics. We then replicate all of
> these CFS writes to the KeyspaceMetrics and globalWriteLatencies.
> For example, for each CFS write we do first write to the CFS's metrics and
> then to Keyspace's metrics and finally globalMetrics. The way Timer is built
> is to maintain a Histogram and a Meter and update both when Timer is updated.
> The Meter then updates 4 different values (1 minute rate, 5 minute rate, 15
> minutes rate and a counter).
> So for each CFS write we actually do 15 different counter updates. And then
> of course maintain their states at the same time while writing. These
> operations are very slow when combined.
> A small JMH benchmark doing an update against a single LatencyMetrics with 4
> threads gives us around 5.2M updates / second. With the current writeLatency
> metric (having 2 parents) we get only 1.6M updates / second.
> I'm proposing to update this to use a small circular buffer HdrHistogram
> implementation. We would maintain a rolling buffer with last 15 minutes of
> histograms (30 seconds per histogram) and update the correct bucket each
> time. When requesting metrics we would then merge requested amount of buckets
> to a new histogram and parse results from it. This moves some of the load
> from writing of the metrics to reading them (which is much more infrequent
> operation), including the parent metrics. It also allows us to maintain the
> current metrics structure - if we wish to do so.
> My prototype with this approach improves the performance to around 13.8M
> updates/second, thus almost 9 times faster than the current approach. We also
> maintain HdrHistogram already in the Cassandra's lib so there's no new
> dependencies to add (java-driver also uses it).
> FUTURE:
> This opens up some possibilities, such as replacing all dropwizard
> Histograms/Meters with the new approach (to reduce overhead elsewhere in the
> codebase). It would also allow us to supply downloadable histograms directly
> from the Cassandra or store them to the disk each time a bucket is filled if
> user wishes to monitor latency history or graph all percentiles.
> HdrHistogram also provides the ability to "fix" these histograms with pause
> tracking, such as GC pauses which we currently can't do (as dropwizard
> histograms can't be merged).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]