Thanks for your thoughts guys.
I agree that with vnodes total downtime is lessened. Although it also
seems that the total number of outages (however small) would be greater.
But I think downtime is only lessened up to a certain cluster size.
I'm thinking that as the cluster continues to grow:
Hi Eric,
The time to recover one node is limited by that node, but the time to
recover that's most important is just the time to replicate the data that
is missing from that node. This is the removetoken operation (called
removenode in 1.2), and this gets faster the more nodes you have.
Ok, thanks Richard. That's good to hear.
However, I still contend that as node count increases to infinity, the
probability of there being at least two node failures in the cluster at any
time would increase to 100%.
I think of this as somewhat analogous to RAID -- I would not be comfortable
Hi Tyler,
You're right, the math does assume independence which is unlikely to be
accurate. But if you do have correlated failure modes e.g. same power,
racks, DC, etc. then you can still use Cassandra's rack-aware or DC-aware
features to ensure replicas are spread around so your cluster can
Assuming you need to work with quorum in a non-vnode scenario. That means
that if 2 nodes in a row in the ring are down some number of quorum
operations will fail with UnavailableException (TimeoutException right
after the failures). This is because the for a given range of tokens quorum
will be
Nicolas,
Strictly speaking, your math makes the assumption that the failure of
different nodes are probabilistically independent events. This is, of
course, not a accurate assumption for real world conditions. Nodes share
racks, networking equipment, power, availability zones, data centers, etc.
Good point . hadoop sprays its blocks around randomly. Thus if replication
factor nodes are down some blocks are not found. The larger the cluster the
higher chance nodes are down.
To deal with this increase rf once the cluster gets to be very large.
On Wednesday, December 5, 2012, Eric Parusel
Hi Eric,
Your concerns are perfectly valid.
We (Acunu) led the design and implementation of this feature and spent a
long time looking at the impact of such a large change.
We summarized some of our notes and wrote about the impact of virtual nodes
on cluster uptime a few months back: