Hey,
I have a few of VM host (bare metal) machines with varying amounts of free
hard drive space on them. For simplicity let’s say I have three machine like
so:
* Machine 1
- Harddrive 1: 150 GB available.
* Machine 2:
- Harddrive 1: 150 GB available.
- Harddrive 2: 150 GB available.
* Machine 3.
- Harddrive 1: 150 GB available.
I am setting up a Cassandra cluster between them and as I see it I have two
options:
1. I set up one Cassandra node/VM per bare metal machine. I assign all free
hard drive space to each Cassandra node and I balance the cluster using
vnodes proportionally to the amount of free hard drive space (CPU/RAM is not
going to be a bottle neck here).
2. I set up four VMs, each running a Cassandra node with equal amount of hard
drive space and equal amount of vnodes. Machine 2 runs two VMs.
This setup will potentially create a situation where if Machine 2 goes down you
may lose two replicas. As the two VMs on Machine 2 might be replicas for the
same key.
General question: Is any of these preferable to the other? I understand 1)
yields lower high-availability (since nodes are on the same hardware).
Other way around (2 would be potentially lower availability)… Cassandra thinks
two of the vm's are separate when they in fact rely on the same underlying
machine.
Question about alternative 1: With varying vnodes, can I always be sure that
replicas are never put on the same virtual machine?
Yes… mostly https://issues.apache.org/jira/browse/CASSANDRA-4123
Or is varying vnodes really only useful/recommended when migrating from
machines with varying hardware (like mentioned in [1])?
Changing the number of vnodes changes the portion of the ring a node is
responsible for. You can use it to account for different types of hardware, you
can also use it for creating awesome situations like hotspots if you aren't
careful… ymmv.
At the end of the day I would throw out the extra hard drive / not use it / put
more hard drives in the other machines. Why? Hard drives are cheap and your
time as an admin for the cluster isn't. If you do add more hard drives you can
also split out the commit log etc onto different disks.
I would take less problems over trying to draw every last scrap of performance
out of the available hardware any day of the year.
Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359