I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using "wide rows" (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z).

My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client machine is hardly breaking a sweat). I've tried turning off compression in my table which reduced the loads slightly but not much. There are no other updates or reads occurring, except the datastax opscenter.

I was expecting to be able to insert at least 10k rows/second with this configuration, and after a lot of reading of docs, blogs, and google, can't really figure out what's slowing my client down. When I increase the insert speed of my client beyond 2000/second, the server responses are just too slow and the client falls behind. I had a single-node Mysql database that can handle 10k of these data rows/second, so I really feel like I'm missing something in Cassandra. Any ideas?

Reply via email to