VNodes, Replication and Minimum cluster size
I have heard before that the recommended minimum cluster size is 4 (with replication factor of 3). I am curious to know if vnodes would change that or if that statement was valid to begin with! The use case I am working on is one where we see tremendous amount of load for just 2 days out of the week and the rest of the time the cluster is pretty much idle. It appears that vnodes will allow me to auto-scale the clusters size a little easier, but I am wondering what is the smallest I can get the cluster in physical server count and still have a good replication count. I'll panic about having 1 of 2 or 1 of 3 servers going down in an outage as a separate topic alone at night while not sleeping. Thanks! Ryan
Re: VNodes, Replication and Minimum cluster size
I am curious to know if vnodes would change that or if that statement was valid to begin with! This question was answered yesterday by Jonathan Ellis during the Datastax C*ollege Webinar: http://www.datastax.com/resources/webinars/whatsnewincassandra12 (about the end of the video). The answer is no. Vnodes doesn't change anything concerning the number of nodes, the RF or the SPOF. I don't know why you should start with 4 nodes, imho, 3 nodes should be enough, or even 1 or 2 if you don't care about consistency or SPOF. The point of having 3 nodes is that allow you to write read using CL quorum, which ensure you retrieving consistent data. If a potential inconsistent data is not a problem for you, you can use 2 nodes RF = 2 and do both reads and writes with CL one, you'll have no SPOF. You can work with one node but that's not really interesting since you don't benefit from Cassandra at all. With 4 nodes instead of 3 with a RF = 3, you start increasing your performance because you don't have to write everything in every node. But there is no need of starting with 4 nodes at all. I hope I've been clear since english is not my mother tongue. Alain 2013/1/10 Ryan Lowe ryanjl...@gmail.com I have heard before that the recommended minimum cluster size is 4 (with replication factor of 3). I am curious to know if vnodes would change that or if that statement was valid to begin with! The use case I am working on is one where we see tremendous amount of load for just 2 days out of the week and the rest of the time the cluster is pretty much idle. It appears that vnodes will allow me to auto-scale the clusters size a little easier, but I am wondering what is the smallest I can get the cluster in physical server count and still have a good replication count. I'll panic about having 1 of 2 or 1 of 3 servers going down in an outage as a separate topic alone at night while not sleeping. Thanks! Ryan
Re: VNodes, Replication and Minimum cluster size
On 10 January 2013 13:07, Ryan Lowe ryanjl...@gmail.com wrote: I have heard before that the recommended minimum cluster size is 4 (with replication factor of 3). I am curious to know if vnodes would change that or if that statement was valid to begin with! The reason that RF=3 is recommended is that it is the minumum RF that allows you to have both strong consistency and tolerate one node failure (reading and writing at consistency level = QUORUM). With RF=2 for example you would have to choose between having strong consistency (read/write at CL=QUORUM) or tolerating one node failure (read/write at CL=ONE). The use case I am working on is one where we see tremendous amount of load for just 2 days out of the week and the rest of the time the cluster is pretty much idle. It appears that vnodes will allow me to auto-scale the clusters size a little easier, The key advantage of vnodes in this case is that you do not need to manually rebalance the cluster when adding or removing nodes. but I am wondering what is the smallest I can get the cluster in physical server count and still have a good replication count. 3 nodes at RF=3 would be the smallest advisable size. You could even drop this to 2 nodes if you did not need consistency or availability guarantees during the time the cluster is at the smallest size. I'll panic about having 1 of 2 or 1 of 3 servers going down in an outage as a separate topic alone at night while not sleeping. Make sure that you over-provision sufficiently so that 2 nodes can handle the load that 3 nodes would normally be taking in the event that a node fails! (Or more generally, ensure N-1 nodes can handle the load of N nodes). Testing a simulated load on a test cluster with one node failure is always a good way to increase your confidence about availability and explore any potential degradation in your client application (latency etc.). Thanks! Ryan -- Sam Overton Acunu | http://www.acunu.com | @acunu
Re: VNodes, Replication and Minimum cluster size
The key advantage of vnodes in this case is that you do not need to manually rebalance the cluster when adding or removing nodes. Well, I thing that a bigger key advantage of vnodes would rather be the performance improvement due to the evenly distributed load while streaming data. But it indeed simplify cluster balancing even if you have an heterogeneous cluster with different hardware for each node. Anyways, this is well described here: http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 Alain 2013/1/10 Sam Overton s...@acunu.com The key advantage of vnodes in this case is that you do not need to manually rebalance the cluster when adding or removing nodes.