[jira] [Commented] (CASSANDRA-18580) Baseline Metrics for Accord Transactions

Jacek Lewandowski (Jira) Tue, 01 Aug 2023 01:33:17 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749561#comment-17749561
 ]


Jacek Lewandowski commented on CASSANDRA-18580:
-----------------------------------------------

Some additional metrics that could be considered are:
- meter of transactions executed with the fast path
- meter of transactions executed with the slow path

is it correct thinking that {{txnId == executeAt}} denotes that a transaction 
was executed with the fast path?

More thoughts:
{{TxnId.rw()}} provides information whether the transaction is read-only or 
read/write. Also, {{TxnId.domain()}} provides information about whether the 
transaction is  keys or domain oriented, but the later is not implemented in 
Cassandra for now IIUC. 

We can also take into account the ballot hlc to measure for example:
- time from start to recovery
- recovery time
- conditional execution time - either from executeAt or ballot until executed

wdyt?

cc [~henrik.ingo]

> Baseline Metrics for Accord Transactions
> ----------------------------------------
>
>                 Key: CASSANDRA-18580
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18580
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Accord, Observability/JMX, Observability/Metrics
>            Reporter: Caleb Rackliffe
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Based on some conversations w/ [~benedict] and [~dcapwell], this is the 
> initial set of metrics that seem both feasible to implement and useful as we 
> monitor the health of a cluster performing Accord transactions:
> 1.) Basic latency metrics for transactions up to the point of COMMIT and rate 
> metrics for preemption, failure, and timeouts at the coordinator.
> This has already been implemented and split into read and write-specific 
> metrics. Our position for now is that metrics around preemption should be 
> useful in place of a more difficult-to-define metric around how many 
> transactions are completed via recovery.
> 2.) Global cache stats/metrics (i.e. aggregated for all command stores)
> We could, at some point, build metrics scoped to a specific {{CommandStore}}, 
> but they might be awkward in MBean/JMX space, as command stores would have to 
> be identified by ID or key range…the latter possibly being able to change 
> across epochs. (An alternative would be just publishing command 
> store-specific stats on-demand to a virtual table instead.)
> 3.) Something like a decaying histogram of the number of dependencies per 
> transaction (or per partial transaction).
> If this is getting worse over time, it could be useful to know/be a way for 
> us to detect that contention is increasing. We should be able to hook this up 
> to {{ProgressLog}} notifications. Recording for PartialDeps/PartialTxn (which 
> ProgressLog gives us at pre-accept) seems acceptable, given this is a 
> directional metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-18580) Baseline Metrics for Accord Transactions

Reply via email to