[jira] [Updated] (CASSANDRA-17175) More detailed latency metrics

Stefan Miklosovic (Jira) Mon, 13 Dec 2021 10:31:07 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-17175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stefan Miklosovic updated CASSANDRA-17175:
------------------------------------------
    Description: 
There is a disconnect with latency clients experience and the latency reported 
by Cassandra. For example read latency only measures the latency of the 
StorageProxy::readRows call.

None of the time spent sitting in the Native Transport queue is measured. 
Neither is any of the time for writing the response back to the channel.

Dispatcher processRequest keeps track of when it first starts processing the 
request but best I can tell this is only used in tracking for timeouts.

It would be useful for tracking down cause of high client latency if there were 
more detailed Cassandra metrics around it.

I have attached a patch that adds latency tracking higher in the call stack. 
Starting timer from before it is put into the Native Transport Request 
executor. The patch gives 3 different metrics per Request type:

delay - measures time from when it is submitted to NTR pool until it call 
processRequest

process - time spent in the Dispatcher processRequest call

total - time from when first submitted to NTR pool until the response has been 
flushed

This patch may not be cleanest or best way of doing this but hopefully gives an 
idea of what I think would be useful addition that will help operators diagonse 
latency issues.

  was:
There is a disconnect with latency clients experience and the latency reported 
by Cassandra. For example read latency only measures the latency of the 
StorageProxy::readRows call.

None of the time spent sitting in the Native Transport queue is measured. 
Neither is any of the time for writing the response back to the channel.

Dispatcher processRequest keep track of when if first starts processing the 
request but best I can tell this is only used in tracking for timeouts.

It would be useful for tracking down cause of high client latency if there was 
more detailed cassandra metrics around it.

I have attached a patch that adds latency tracking higher in the call stack. 
Starting timer from before its put into the Native Transport Request executor. 
The patch gives 3 different metrics per Request type:

delay - measures time from when its submitted to NTR pool till it call 
processRequest

process - time spent in the Dispatcher processRequest call

total - time from when first submitted to NTR pool until the response has been 
flushed

 

This patch may not be cleanest or best way of doing this but hopefully gives an 
idea of what I think would be useful addition that will help operators diagonse 
latency issues.


> More detailed latency metrics
> -----------------------------
>
>                 Key: CASSANDRA-17175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17175
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Observability/Metrics
>            Reporter: Cameron Zemek
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 4.x
>
>         Attachments: request_latency_metric.patch
>
>
> There is a disconnect with latency clients experience and the latency 
> reported by Cassandra. For example read latency only measures the latency of 
> the StorageProxy::readRows call.
> None of the time spent sitting in the Native Transport queue is measured. 
> Neither is any of the time for writing the response back to the channel.
> Dispatcher processRequest keeps track of when it first starts processing the 
> request but best I can tell this is only used in tracking for timeouts.
> It would be useful for tracking down cause of high client latency if there 
> were more detailed Cassandra metrics around it.
> I have attached a patch that adds latency tracking higher in the call stack. 
> Starting timer from before it is put into the Native Transport Request 
> executor. The patch gives 3 different metrics per Request type:
> delay - measures time from when it is submitted to NTR pool until it call 
> processRequest
> process - time spent in the Dispatcher processRequest call
> total - time from when first submitted to NTR pool until the response has 
> been flushed
> This patch may not be cleanest or best way of doing this but hopefully gives 
> an idea of what I think would be useful addition that will help operators 
> diagonse latency issues.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-17175) More detailed latency metrics

Reply via email to