Pluggable throttling of read and write queries

2017-02-17 Thread Abhishek Verma
Cassandra is being used on a large scale at Uber. We usually create dedicated clusters for each of our internal use cases, however that is difficult to scale and manage. We are investigating the approach of using a single shared cluster with 100s of nodes and handle 10s to 100s of different use

Re: Count(*) is not working

2017-02-17 Thread kurt greaves
really... well that's good to know. it still almost never works though. i guess every time I've seen it it must have timed out due to tombstones. On 17 Feb. 2017 22:06, "Sylvain Lebresne" wrote: On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves wrote:

Re: lots of connection timeouts around same time every day

2017-02-17 Thread kurt greaves
typically when I've seen that gossip issue it requires more than just restarting the affected node to fix. if you're not getting query related errors in the server log you should start looking at what is being queried. are the queries that time out each day the same?

Re: High disk io read load

2017-02-17 Thread kurt greaves
what's the Owns % for the relevant keyspace from nodetool status?

Re: lots of connection timeouts around same time every day

2017-02-17 Thread Mike Torra
I can't say that I have tried that while the issue is going on, but I have done such rolling restarts for sure, and the timeouts still occur every day. What would a rolling restart do to fix the issue? In fact, as I write this, I am restarting each node one by one in the eu-west-1 datacenter, and

Re: Count(*) is not working

2017-02-17 Thread Sagar Jambhulkar
+1 for using spark for counts. On Feb 17, 2017 4:25 PM, "kurt greaves" wrote: > if you want a reliable count, you should use spark. performing a count (*) > will inevitably fail unless you make your server read timeouts and > tombstone fail thresholds ridiculous > > On 17

Re: Count(*) is not working

2017-02-17 Thread siddharth verma
Hi, We faced this issue too. You could try with reduced paging size, so that tombstone threshold isn't breached. try using "paging 500" in cqlsh [ https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshPaging.html ] Similarly paging size could be set in java driver as well This is a work

Re: Count(*) is not working

2017-02-17 Thread Sylvain Lebresne
On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves wrote: > if you want a reliable count, you should use spark. performing a count (*) > will inevitably fail unless you make your server read timeouts and > tombstone fail thresholds ridiculous > That's just not true. count(*)

Re: lots of connection timeouts around same time every day

2017-02-17 Thread kurt greaves
have you tried a rolling restart of the entire DC?

Re: sasi index question (read timeout on many selects)

2017-02-17 Thread Benjamin Roth
Btw: They break incremental repair if you use CDC: https://issues.apache. org/jira/browse/CASSANDRA-12888 Not only when using CDC! You shouldn't use incremental repairs with MVs. Never (right now). 2017-02-16 17:42 GMT+01:00 Jonathan Haddad : > My advice to avoid them is

Re: High disk io read load

2017-02-17 Thread Benjamin Roth
Hi Nate, See here dstat results: https://gist.github.com/brstgt/216c662b525a9c5b653bbcd8da5b3fcb Network volume does not correspond to Disk IO, not even close. @heterogenous vnode count: I did this to test how load behaves on a new server class we ordered for CS. The new nodes had much faster

Re: Count(*) is not working

2017-02-17 Thread kurt greaves
if you want a reliable count, you should use spark. performing a count (*) will inevitably fail unless you make your server read timeouts and tombstone fail thresholds ridiculous On 17 Feb. 2017 04:34, "Jan" wrote: > Hi, > > could you post the output of nodetool cfstats for the