I'd say that no, a range query probably isn't the best for monitoring, but
it really depends on how important it is that the range you select is
consistent.

>From those traces it does seem that the bulk of the time spent was waiting
for responses from the replicas, which may indicate a network issue, but
it's not conclusive evidence.

For SSTables you could check the SSTables per read of the query, but it's
unnecessary as the traces indicate that's not the issue. Might be worth
trying to debug potential network issues. Might be worth looking into
metrics like CoordinatorReadLatency and CoordinatorScanLatency at the table
level
https://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics
Also if you have any network traffic metrics between nodes would be a good
place to look.

​Other than that I'd look in the logs on each node when you run the trace
and try and identify any errors that could be causing problems.

Reply via email to