On Wed, Mar 21, 2012 at 3:24 PM, Tom Wilkie <t...@acunu.com> wrote:
> Hi Edward
>
>> 1) No more raid 0. If a machine is responsible for 4 vnodes they
>> should correspond to for JBOD.
>
> So each vnode corresponds to a disk?  I suppose we could have a
> separate data directory per disk, but I think this should be a
> separate, subsequent change.
>
I think having more micro-ranges makes the process much easier.
Image a token ring 1-30

Node1 | major range 0-10  | disk 1    0-2 , disk2 3-4, disk 3 5-7, disk 4 8-10
Node2 | major range 11-20| disk 1 11-12 , disk2 13-14, disk 3 15-17,
disk 4 18-20
Node3 | major range 21-30| disk 1 21-22 , disk2 23-24, disk 3 25-27,
disk 4 28-30

Adding a 4th node is easy:
If you are at the data center, just take disk 4 out of each node and
place it in new server :)
Software wise it is the same deal. Each node streams off only disk 4
to the new node.

Now at this point disk 4 is idle and each machine should re balance
its own data across its 4 disks.

> However, do note that making the vnode ~size of a disk (and only have
> 4-8 per machine) would make any non-hotswap rebuilds slower.  To get
> the fast distributed rebuilds, you need to have at least as many
> vnodes per node as you do nodes in the cluster.  And you would still
> need the distributed rebuilds to deal with disk failure.
>
>> 2) Vnodes should be able to be hot pluged. My normal cassandra chassis
>> would be a 2U with 6 drive bays. Imagine I have 10 nodes. Now if my
>> chassis dies I should be able to take the disks out and physically
>> plug them into another chassis. Then in cassandra I should be able to
>> run a command like.
>> nodetool attach '/mnt/disk6'. disk6 should contain all data an it's
>> vnode information.
>>
>> Now this would be awesome for upgrades/migrations/etc.
>
> You know, your not the first person I've spoke to who has asked for
> this!  I do wonder whether it is optimising for the right thing though
> - in my experience, disks fail more often than machines.
>
> Thanks
>
> Tom

Reply via email to