Your fastest route might be to run a profiler on Cassandra and get some flame graphs. I'm a fan of the async-profiler:
https://github.com/jvm-profiling-tools/async-profiler Joey Lynch did a nice write up in the documentation on a different process, which I haven't used yet: http://cassandra.apache.org/doc/latest/troubleshooting/use_tools.html#cpu-flamegraphs If I have some time today I'll put together a tlp-stress ( https://github.com/thelastpickle/tlp-stress) workload to see if I can reproduce it locally. Jon On Mon, Jan 28, 2019 at 7:23 AM Tom Wollert <tomwoll...@codeweavers.net> wrote: > > Hello, > > We have noticed CPU usage spike after several minutes of consistent load > when querying: > - a single column of set<uuid> type (same partition key) > - relatively frequently (couple hundred times per second, for comparison, > we do an order of magnitude more reads already with much bigger payloads) > - with the elements in the set having a very short TTL ( single digit > seconds) and several inserts per second > - gc_grace set to 0 (that should remove hints and should prevent > tombstones) > - reads and writes are using local quorum consistency > - replication factor of 3 (on 4 node setup) > > I am struggling to figure out where the high CPU usage comes from (and > thus how to resolve it) and hoping that some one sees what we are doing > wrong. I'd expect the data to stay in memory on the cluster and have > constant read time. > > The use case is rate limiting. We basically limit a user (for example) to > 20 requests per 5 seconds and are using cassandra's TTL to implement it > across all live servers. So when a request comes along we run the following > query: > > SELECT tokens > FROM recent_request_token_bucket > WHERE usagekey = 'some user id' > > If the tokens' count is less than 20 we execute > UPDATE recent_request_token_bucket > USING TTL 5 > SET tokens = tokens + 'Guid.NewGuid()' > WHERE usagekey = 'some user id' > > If the token's count is greater than 20 we reject the request > > The table definition is > CREATE TABLE recent_request_token_bucket > ( > usagekey text, > tokens set<uuid>, > PRIMARY KEY (usagekey) > ) > WITH > compaction={'min_threshold': 2, 'class': > 'SizeTieredCompactionStrategy', 'max_threshold': 32} > AND > compression={'sstable_compression': 'SnappyCompressor'} > AND > gc_grace_seconds=0; > > I have replicated it with the following: > 200 reads per second > 3 inserts per second > > This starts of with CPU load is ~10% and average response time (as > reported by my console app) 1-2 ms > After 5 the CPU load creeps up to ~20% and average response time 2-4ms > After 10 minutes the CPU load is over 50 and average response times starts > to hit 10ms > After 15 minutes the CPU load is near 100% and response times over 100ms > become normal. > > Interestingly, when aborting the application, waiting several minutes and > then restarting, the response times and CPU load on the server remain > terrible. It's like I poisoned that partition key permanently. This also > survives flushes of the memtable. > > I'd expect a constant response time in our use case as there should be no > more than 20 odd guids in the set. But it appears that cassandra maintains > the tombstones in memory? > > We are running 2.1.20 > > I'd appreciate any pointers! > > Cheers, > > Tom > > -- > Development Director > > | T: 0800 021 0888 | M: 0790 4489797 | www.codeweavers.net | > | Codeweavers Limited | Barn 4 | Dunston Business Village | Dunston | ST18 > 9AB | > | Registered in England and Wales No. 04092394 | VAT registration no. 974 > 9705 63 | > > CUSTOMERS' BLOG <http://blog.codeweavers.net/> TWITTER > <http://twitter.com/#%21/CodeweaversLtd> FACEBOOK > <http://www.facebook.com/pages/Codeweavers-Limited/205794062775987> > LINKED > IN <http://www.linkedin.com/company/225698?trk=tyah> DEVELOPERS' BLOG > <http://codeweavers.wordpress.com/> YOUTUBE > <http://www.youtube.com/user/codeweaversltd> > > <https://codeweavers.net> > > What's happened at Codeweavers in 2018? > <https://codeweavers.net/company-blog/what-s-happened-at-codeweavers-in-2018> > l *Codeweavers 2018 Stats & Trends > <https://gallery.mailchimp.com/fcb361cfa194cf70551bc5169/files/debe4909-70ff-45d7-9bfd-05f43fa2e504/Codeweavers_stats_2018.03.pdf>* > > *Phone:* 0800 021 0888 * Email: *contac...@codeweavers.net > *Codeweavers Ltd* | Barn 4 | Dunston Business Village | Dunston | ST18 9AB > Registered in England and Wales No. 04092394 | VAT registration no. 974 > 9705 63 > > [image: Twitter] <https://twitter.com/Codeweavers_Ltd> [image: Facebook] > <https://www.facebook.com/Codeweavers.Ltd/> [image: linkedin] > <https://www.linkedin.com/company/codeweavers-limited> > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade