Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-11 Thread Eric Parusel
Thanks for your thoughts guys. I agree that with vnodes total downtime is lessened. Although it also seems that the total number of outages (however small) would be greater. But I think downtime is only lessened up to a certain cluster size. I'm thinking that as the cluster continues to grow:

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-11 Thread Richard Low
Hi Eric, The time to recover one node is limited by that node, but the time to recover that's most important is just the time to replicate the data that is missing from that node. This is the removetoken operation (called removenode in 1.2), and this gets faster the more nodes you have.

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-11 Thread Eric Parusel
Ok, thanks Richard. That's good to hear. However, I still contend that as node count increases to infinity, the probability of there being at least two node failures in the cluster at any time would increase to 100%. I think of this as somewhat analogous to RAID -- I would not be comfortable

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-10 Thread Richard Low
Hi Tyler, You're right, the math does assume independence which is unlikely to be accurate. But if you do have correlated failure modes e.g. same power, racks, DC, etc. then you can still use Cassandra's rack-aware or DC-aware features to ensure replicas are spread around so your cluster can

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-10 Thread Edward Capriolo
Assuming you need to work with quorum in a non-vnode scenario. That means that if 2 nodes in a row in the ring are down some number of quorum operations will fail with UnavailableException (TimeoutException right after the failures). This is because the for a given range of tokens quorum will be

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-09 Thread Tyler Hobbs
Nicolas, Strictly speaking, your math makes the assumption that the failure of different nodes are probabilistically independent events. This is, of course, not a accurate assumption for real world conditions. Nodes share racks, networking equipment, power, availability zones, data centers, etc.

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-07 Thread Edward Capriolo
Good point . hadoop sprays its blocks around randomly. Thus if replication factor nodes are down some blocks are not found. The larger the cluster the higher chance nodes are down. To deal with this increase rf once the cluster gets to be very large. On Wednesday, December 5, 2012, Eric Parusel

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-07 Thread Nicolas Favre-Felix
Hi Eric, Your concerns are perfectly valid. We (Acunu) led the design and implementation of this feature and spent a long time looking at the impact of such a large change. We summarized some of our notes and wrote about the impact of virtual nodes on cluster uptime a few months back: