> I've got a batch process running every so often that issues a bunch of > counter increments. I have noticed that when this process runs without being > throttled it will raise the CPU to 80-90% utilization on the nodes handling > the requests. This in turns timeouts and general lag on queries running on > the cluster.
This much is entirely expected. If you are not bottlenecking anywhere else and saturing the cluster, you will be bound by it, and it will affect the latency of other traffic, no matter how fast or slow Cassandra is. You do say "nodes handling the requests". Two things to always keep in mind is to (1) spread the requests evenly across all members of the cluster, and (2) if you are doing a lot of work per row key, spread it around and be concurrent so that you're not hitting a single row at a time, which will be under the responsibility of a single set of RF nodes (you want to put load on the entire cluster evently if you want to maximize throughput). > Is there anything that can be done to increase the throughput, I've been > looking on the wiki and the mailing list and didn't find any optimization > suggestions (apart from spreading the load on more nodes). > > Cluster is 5 node, BOP, RF=3, AMD opteron 4174 CPU (6 x 2.3 Ghz cores), > Gigabit ethernet, RAID-0 SATA2 disks For starters, what *is* the throughput? How many counter mutations are you submitting per second? -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)