Re: ALL range query monitors failing frequently

2017-06-29 Thread Matthew O'Riordan
Thanks Kurt, I appreciate that feedback. I’ll investigate the metrics more fully and come back with my finding. In terms of logs, I did look in the logs of the nodes and found nothing I am afraid. On Wed, Jun 28, 2017 at 11:33 PM, kurt greaves wrote: > I'd say that no, a

Re: ALL range query monitors failing frequently

2017-06-28 Thread kurt greaves
I'd say that no, a range query probably isn't the best for monitoring, but it really depends on how important it is that the range you select is consistent. >From those traces it does seem that the bulk of the time spent was waiting for responses from the replicas, which may indicate a network

Re: ALL range query monitors failing frequently

2017-06-28 Thread Matthew O'Riordan
Hi Kurt Thanks for the response. Few comments in line: On Wed, Jun 28, 2017 at 1:17 PM, kurt greaves wrote: > You're correct in that the timeout is only driver side. The server will > have its own timeouts configured in the cassandra.yaml file. > Yup, OK. I suspect

Re: ALL range query monitors failing frequently

2017-06-28 Thread kurt greaves
You're correct in that the timeout is only driver side. The server will have its own timeouts configured in the cassandra.yaml file. I suspect either that you have a node down in your cluster (or 4), or your queries are gradually getting slower. This kind of aligns with the slow query statements

ALL range query monitors failing frequently

2017-06-28 Thread Matthew O'Riordan
We have a monitoring service that runs on all of our Cassandra nodes which performs different query types to ensure the cluster is healthy. We use different consistency levels for the queries and alert if any of them fail. All of our query types consistently succeed apart from our ALL range