> I actually has the opposite 'problem'. I have a pair of servers that have > been static since mid last week, but have seen performance vary > significantly (x10) for exactly the same query. I hypothesised it was > various caches so I shut down Cassandra, flushed the O/S buffer cache and > then bought it back up. The performance wasn't significantly different to > the pre-flush performance
I don't get this thread at all :) Why would restarting with clean caches be expected to *improve* performance? And why is key cache loading involved other than to delay start-up and hopefully pre-populating caches for better (not worse) performance? If you want to figure out why queries seem to be slow relative to normal, you'll need to monitor the behavior of the nodes. Look at disk I/O statistics primarily (everyone reading this running Cassandra who aren't intimately familiar with "iostat -x -k 1" should go and read up on it right away; make sure you understand the utilization and avg queue size columns), CPU usage, weather compaction is happening, etc. One easy way to see sudden bursts of poor behavior is to be heavily reliant on cache, and then have sudden decreases in performance due to compaction evicting data from page cache while also generating more I/O. But that's total speculation. It is also the case that you cannot expect consistent performance on EC2 and that might be it. But my #1 advise: Log into the node while it is being slow, and observe. Figure out what the bottleneck is. iostat, top, nodetool tpstats, nodetool netstats, nodetool compactionstats. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)