[
https://issues.apache.org/jira/browse/CASSANDRA-16701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Caleb Rackliffe updated CASSANDRA-16701:
----------------------------------------
Test and Documentation Plan: some new unit test coverage for the
WaitingOnCommit and WaitingOnFlush metrics w/ batch CL
Status: Patch Available (was: In Progress)
[patch (trunk)|https://github.com/apache/cassandra/pull/1080]
[j8
tests|https://app.circleci.com/pipelines/github/maedhroz/cassandra/276/workflows/0ebf6271-cd11-4eb6-9cb9-e9059c228877]
[j11
tests|https://app.circleci.com/pipelines/github/maedhroz/cassandra/276/workflows/62924207-89b1-4fd2-bbd7-25028de8c441]
> Data Points for the CommitLog's WaitingOnCommit Metric Should Describe Single
> Mutations
> ---------------------------------------------------------------------------------------
>
> Key: CASSANDRA-16701
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16701
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Commit Log, Observability/JMX
> Reporter: Caleb Rackliffe
> Assignee: Caleb Rackliffe
> Priority: Normal
> Fix For: 4.x
>
>
> The metrics we have around the {{CommitLog}} aren’t as useful as they could
> be in the context of investigating the performance of local writes.
> 1.) We have no way to know how long the actual flush to disk takes in
> isolation, i.e. separate from the signaling apparatus between mutation
> threads and the sync thread. We should add a metric for this.
> 2.) The WaitingOnCommit metric can have multiple data points recorded for a
> single mutation, which is a little awkward when we’re trying to break down
> the latency of a local write (total time for CL add + Memtable put, etc.).
> More specifically, a thread waits for the sync thread to catch up to the
> position of its mutation, but it can wake up for a sync operation that hasn’t
> arrived there yet, which triggers another wait. A new data point is recorded
> for the metric each time this happens. We should move the scope of metric
> recording up a level so that there is a 1-1 relationship between it and
> WriteLatency in TableMetrics (which covers row cache updates and the Memtable
> put).
> {noformat}
> void waitForSync(int position, Timer waitingOnCommit)
> {
> while (lastSyncedOffset < position)
> {
> WaitQueue.Signal signal = waitingOnCommit != null ?
>
> syncComplete.register(waitingOnCommit.time()) :
> syncComplete.register();
> if (lastSyncedOffset < position)
> signal.awaitUninterruptibly();
> else
> signal.cancel();
> }
> }
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]