I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz
machines not shared with any other VMs). I'm inserting time-series data
into a single column-family using "wide rows" (timeuuids) and have a
3-part partition key so my primary key is something like ((a, b, day),
in-time-uuid), x, y, z).
My java client is feeding rows (about 1k of raw data size each) in
batches using multiple threads, and the fastest I can get it run
reliably is about 2000 rows/second. Even at that speed, all 3 cassandra
nodes are very CPU bound, with loads of 6-9 each (and the client machine
is hardly breaking a sweat). I've tried turning off compression in my
table which reduced the loads slightly but not much. There are no other
updates or reads occurring, except the datastax opscenter.
I was expecting to be able to insert at least 10k rows/second with this
configuration, and after a lot of reading of docs, blogs, and google,
can't really figure out what's slowing my client down. When I increase
the insert speed of my client beyond 2000/second, the server responses
are just too slow and the client falls behind. I had a single-node
Mysql database that can handle 10k of these data rows/second, so I
really feel like I'm missing something in Cassandra. Any ideas?
- insert performance (1.2.8) Keith Freeman
-