Re: Heterogenous cluster and vnodes

2014-08-30 Thread Ben Bromhead

 Hey,
 
 I have a few of VM host (bare metal) machines with varying amounts of free 
 hard drive space on them. For simplicity let’s say I have three machine like 
 so:
  * Machine 1
   - Harddrive 1: 150 GB available.
  * Machine 2:
   - Harddrive 1: 150 GB available.
   - Harddrive 2: 150 GB available.
  * Machine 3.
   - Harddrive 1: 150 GB available.
 
 I am setting up a Cassandra cluster between them and as I see it I have two 
 options:
 
 1. I set up one Cassandra node/VM per bare metal machine. I assign all free 
 hard drive space to each Cassandra node and I balance the cluster using 
 vnodes proportionally to the amount of free hard drive space (CPU/RAM is not 
 going to be a bottle neck here).
 
 2. I set up four VMs, each running a Cassandra node with equal amount of hard 
 drive space and equal amount of vnodes. Machine 2 runs two VMs.

This setup will potentially create a situation where if Machine 2 goes down you 
may lose two replicas. As the two VMs on Machine 2 might be replicas for the 
same key.

 
 General question: Is any of these preferable to the other? I understand 1) 
 yields lower high-availability (since nodes are on the same hardware).

Other way around (2 would be potentially lower availability)… Cassandra thinks 
two of the vm's are separate when they in fact rely on the same underlying 
machine.

 
 Question about alternative 1: With varying vnodes, can I always be sure that 
 replicas are never put on the same virtual machine?

Yes… mostly https://issues.apache.org/jira/browse/CASSANDRA-4123

 Or is varying vnodes really only useful/recommended when migrating from 
 machines with varying hardware (like mentioned in [1])?

Changing the number of vnodes changes the portion of the ring a node is 
responsible for. You can use it to account for different types of hardware, you 
can also use it for creating awesome situations like hotspots if you aren't 
careful… ymmv.

At the end of the day I would throw out the extra hard drive / not use it / put 
more hard drives in the other machines. Why? Hard drives are cheap and your 
time as an admin for the cluster isn't. If you do add more hard drives you can 
also split out the commit log etc onto different disks.

I would take less problems over trying to draw every last scrap of performance 
out of the available hardware any day of the year. 


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359



Heterogenous cluster and vnodes

2014-08-29 Thread Jens Rantil
Hey,


I have a few of VM host (bare metal) machines with varying amounts of free hard 
drive space on them. For simplicity let’s say I have three machine like so:
 * Machine 1
  - Harddrive 1: 150 GB available.
 * Machine 2:
  - Harddrive 1: 150 GB available.
  - Harddrive 2: 150 GB available.
 * Machine 3.
  - Harddrive 1: 150 GB available.

I am setting up a Cassandra cluster between them and as I see it I have two 
options:


1. I set up one Cassandra node/VM per bare metal machine. I assign all free 
hard drive space to each Cassandra node and I balance the cluster using vnodes 
proportionally to the amount of free hard drive space (CPU/RAM is not going to 
be a bottle neck here).


2. I set up four VMs, each running a Cassandra node with equal amount of hard 
drive space and equal amount of vnodes. Machine 2 runs two VMs.



General question: Is any of these preferable to the other? I understand 1) 
yields lower high-availability (since nodes are on the same hardware).


Question about alternative 1: With varying vnodes, can I always be sure that 
replicas are never put on the same virtual machine? Or is varying vnodes really 
only useful/recommended when migrating from machines with varying hardware 
(like mentioned in [1])?


[1] http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2


Thanks,
Jens
———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter