Caleb Rackliffe created CASSANDRA-18580:
-------------------------------------------
Summary: Baseline Metrics for Accord Transactions
Key: CASSANDRA-18580
URL: https://issues.apache.org/jira/browse/CASSANDRA-18580
Project: Cassandra
Issue Type: Improvement
Reporter: Caleb Rackliffe
Assignee: Caleb Rackliffe
Based on some conversations w/ [~benedict] and [~dcapwell], this is the initial
set of metrics that seem both feasible to implement and useful as we monitor
the health of a cluster performing Accord transactions:
1.) Basic latency metrics for transactions up to the point of COMMIT and rate
metrics for preemption, failure, and timeouts at the coordinator.
This has already been implemented and split into read and write-specific
metrics. Our position for now is that metrics around preemption should be
useful in place of a more difficult-to-define metric around how many
transactions are completed via recovery.
2.) Global cache stats/metrics (i.e. aggregated for all command stores)
We could, at some point, build metrics scoped to a specific {{CommandStore}},
but they might be awkward in MBean/JMX space, as command stores would have to
be identified by ID or key rangeā¦the latter possibly being able to change
across epochs. (An alternative would be just publishing command store-specific
stats on-demand to a virtual table instead.)
3.) Something like a decaying histogram of the number of dependencies per
transaction (or per partial transaction).
If this is getting worse over time, it could be useful to know/be a way for us
to detect that contention is increasing. We should be able to hook this up to
{{ProgressLog}} notifications. Recording for PartialDeps/PartialTxn (which
ProgressLog gives us at pre-accept) seems acceptable, given this is a
directional metric.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]