[jira] [Commented] (CASSANDRA-18580) Baseline Metrics for Accord Transactions

Jacek Lewandowski (Jira) Tue, 01 Aug 2023 04:59:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749653#comment-17749653
 ]


Jacek Lewandowski commented on CASSANDRA-18580:
-----------------------------------------------

When I tried to implement that, I encountered a problem - how to get notified 
on a replica that a transaction was applied (writes were made durable). I 
noticed that in such a case Accord notifies the agent by calling its generic 
{{apply}} method. That method is currently called in many different situations 
so we cannot blindly assume that calling it always denotes the end of the 
transaction. My question is that if it would be ok to extends the agent to have 
a method like {{onCompleted(TxnId, Throwable failure)}} so that we could 
trigger the metrics?


> Baseline Metrics for Accord Transactions
> ----------------------------------------
>
>                 Key: CASSANDRA-18580
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18580
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Accord, Observability/JMX, Observability/Metrics
>            Reporter: Caleb Rackliffe
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Based on some conversations w/ [~benedict] and [~dcapwell], this is the 
> initial set of metrics that seem both feasible to implement and useful as we 
> monitor the health of a cluster performing Accord transactions:
> 1.) Basic latency metrics for transactions up to the point of COMMIT and rate 
> metrics for preemption, failure, and timeouts at the coordinator.
> This has already been implemented and split into read and write-specific 
> metrics. Our position for now is that metrics around preemption should be 
> useful in place of a more difficult-to-define metric around how many 
> transactions are completed via recovery.
> 2.) Global cache stats/metrics (i.e. aggregated for all command stores)
> We could, at some point, build metrics scoped to a specific {{CommandStore}}, 
> but they might be awkward in MBean/JMX space, as command stores would have to 
> be identified by ID or key range…the latter possibly being able to change 
> across epochs. (An alternative would be just publishing command 
> store-specific stats on-demand to a virtual table instead.)
> 3.) Something like a decaying histogram of the number of dependencies per 
> transaction (or per partial transaction).
> If this is getting worse over time, it could be useful to know/be a way for 
> us to detect that contention is increasing. We should be able to hook this up 
> to {{ProgressLog}} notifications. Recording for PartialDeps/PartialTxn (which 
> ProgressLog gives us at pre-accept) seems acceptable, given this is a 
> directional metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-18580) Baseline Metrics for Accord Transactions

Reply via email to