VNodes, Replication and Minimum cluster size

2013-01-10 Thread Ryan Lowe
I have heard before that the recommended minimum cluster size is 4 (with
replication factor of 3).  I am curious to know if vnodes would change that
or if that statement was valid to begin with!

The use case I am working on is one where we see tremendous amount of load
for just 2 days out of the week and the rest of the time the cluster is
pretty much idle.  It appears that vnodes will allow me to auto-scale the
clusters size a little easier, but I am wondering what is the smallest I
can get the cluster in physical server count and still have a good
replication count.

I'll panic about having 1 of 2 or 1 of 3 servers going down in an outage as
a separate topic alone at night while not sleeping.

Thanks!
Ryan


Re: VNodes, Replication and Minimum cluster size

2013-01-10 Thread Alain RODRIGUEZ
I am curious to know if vnodes would change that or if that statement was
valid to begin with!

This question was answered yesterday by Jonathan Ellis during the Datastax
C*ollege Webinar:
http://www.datastax.com/resources/webinars/whatsnewincassandra12 (about the
end of the video).

The answer is no. Vnodes doesn't change anything concerning the number of
nodes, the RF or the SPOF.

I don't know why you should start with 4 nodes, imho, 3 nodes should be
enough, or even 1 or 2 if you don't care about consistency or SPOF.

The point of having 3 nodes is that allow you to write  read using CL
quorum, which ensure you retrieving consistent data.
If a potential inconsistent data is not a problem for you, you can use 2
nodes RF = 2 and do both reads and writes with CL one, you'll have no SPOF.
You can work with one node but that's not really interesting since you
don't benefit from Cassandra at all.

With 4 nodes instead of 3 with a RF = 3, you start increasing your
performance because you don't have to write everything in every node. But
there is no need of starting with 4 nodes at all.

I hope I've been clear since english is not my mother tongue.

Alain


2013/1/10 Ryan Lowe ryanjl...@gmail.com

 I have heard before that the recommended minimum cluster size is 4 (with
 replication factor of 3).  I am curious to know if vnodes would change that
 or if that statement was valid to begin with!

 The use case I am working on is one where we see tremendous amount of load
 for just 2 days out of the week and the rest of the time the cluster is
 pretty much idle.  It appears that vnodes will allow me to auto-scale the
 clusters size a little easier, but I am wondering what is the smallest I
 can get the cluster in physical server count and still have a good
 replication count.

 I'll panic about having 1 of 2 or 1 of 3 servers going down in an outage
 as a separate topic alone at night while not sleeping.

 Thanks!
 Ryan



Re: VNodes, Replication and Minimum cluster size

2013-01-10 Thread Sam Overton
On 10 January 2013 13:07, Ryan Lowe ryanjl...@gmail.com wrote:
 I have heard before that the recommended minimum cluster size is 4 (with
 replication factor of 3).  I am curious to know if vnodes would change that
 or if that statement was valid to begin with!

The reason that RF=3 is recommended is that it is the minumum RF that
allows you to have both strong consistency and tolerate one node
failure (reading and writing at consistency level = QUORUM). With RF=2
for example you would have to choose between having strong consistency
(read/write at CL=QUORUM) or tolerating one node failure (read/write
at CL=ONE).

 The use case I am working on is one where we see tremendous amount of load
 for just 2 days out of the week and the rest of the time the cluster is
 pretty much idle.  It appears that vnodes will allow me to auto-scale the
 clusters size a little easier,

The key advantage of vnodes in this case is that you do not need to
manually rebalance the cluster when adding or removing nodes.

 but I am wondering what is the smallest I can
 get the cluster in physical server count and still have a good replication
 count.

3 nodes at RF=3 would be the smallest advisable size. You could even
drop this to 2 nodes if you did not need consistency or availability
guarantees during the time the cluster is at the smallest size.

 I'll panic about having 1 of 2 or 1 of 3 servers going down in an outage as
 a separate topic alone at night while not sleeping.

Make sure that you over-provision sufficiently so that 2 nodes can
handle the load that 3 nodes would normally be taking in the event
that a node fails! (Or more generally, ensure N-1 nodes can handle the
load of N nodes). Testing a simulated load on a test cluster with one
node failure is always a good way to increase your confidence about
availability and explore any potential degradation in your client
application (latency etc.).

 Thanks!
 Ryan


-- 
Sam Overton
Acunu | http://www.acunu.com | @acunu


Re: VNodes, Replication and Minimum cluster size

2013-01-10 Thread Alain RODRIGUEZ
The key advantage of vnodes in this case is that you do not need to
manually rebalance the cluster when adding or removing nodes.

Well, I thing that a bigger key advantage of vnodes would rather be the
performance improvement due to the evenly distributed load while streaming
data.

But it indeed simplify cluster balancing even if you have an heterogeneous
cluster with different hardware for each node.

Anyways, this is well described here:
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

Alain

2013/1/10 Sam Overton s...@acunu.com

 The key advantage of vnodes in this case is that you do not need to
 manually rebalance the cluster when adding or removing nodes.