AFAIK, if you were using RF 3 in a 3 node cluster, so all your nodes had all 
your data. 
When the number of nodes started to grow, this assumption stopped being true.
I think Cassandra will scale linearly from 9 nodes on, but comparing a 
situation where all your nodes hold all your data is not really fair, as in 
this situation Cassandra will behave as a database with two more replicas, for 
reads.
I can be wrong, but this is my call.
From: user@cassandra.apache.org 
Subject: Re:Adding more nodes causes performance problem

I have a cluster with 3 nodes, the only keyspace is with replication factor of 
3,
the application read/write UUID-keyed data. I use CQL (casssandra-python),
most writes are done by execute_async, most read are done with consistency
level of ONE, overall performance in this setup is better than I expected.

Then I test 6-nodes cluster and 9-nodes. The performance (both read and
write) was getting worse and worse. Roughly speaking, 6-nodes is about 2~3
times slower than 3-nodes, and 9-nodes is about 5~6 times slower than
3-nodes. All tests were done with same data set, same test program, same
client machines, for multiple times. I'm running Cassandra 2.1.2 with default
configuration.

What I observed, is that with 6-nodes and 9-nodes, the Cassandra servers
were doing OK with IO, but CPU utilization was about 60%~70% higher than
3-nodes.

I'd like to get suggestion how to troubleshoot this, as this is totally against
what I read, that Cassandra is scaled linearly.


Reply via email to