Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Joseph Lynch
As far as I'm aware if you're using a high number of tokens per host you can't bootstrap two hosts without potentially violating RaW consistency if they have overlapping token ranges (with 256 this is basically guaranteed). I'm definitely not an expert on this though, when I've used vnodes I've

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Carl Mueller
Is this a fundamental vnode disadvantage: do Vnodes preclude cluster expansion faster than 1 at a time? I would think with manual management you could expand a datacenter by multiples of machines/nodes. Or at least in multiples of ReplicationFactor: RF3 starts as: a1 b1 c1 doubles to: a1 a2

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Joseph Lynch
I'm pretty worried with large clusters using removenode given my experience with Elasticsearch. Elasticsearch shard recovery is basically removenode + bootstrap, and it does work really quickly if not throttled but it completely destroys latency sensitive clusters (P99's spike to multiple hundreds

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Richard Low
I'm also not convinced the problems listed in the paper with removenode are so serious. With lots of vnodes per node, removenode causes data to be streamed into all other nodes in parallel, so is (n-1) times quicker than replacement for n nodes. For R=3, the failure rate goes up with vnodes

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Carl Mueller
I've posted a bunch of things relevant to commitlog --> sstable and associated compaction / sstable metadata changes on here. I really need to learn that section of the code. On Tue, Apr 17, 2018 at 10:29 AM, Jeff Jirsa wrote: > There are two huge advantages > > 1) during

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Jeff Jirsa
There are two huge advantages 1) during expansion / replacement / decom, you stream from far more ranges. Since streaming is single threaded per stream, this enables you to max out machines during streaming where single token doesn’t 2) when adjusting the size of a cluster, you can often grow

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Carl Mueller
Do Vnodes address anything besides alleviating cluster planners from doing token range management on nodes manually? Do we have a centralized list of advantages they provide beyond that? There seem to be lots of downsides. 2i index performance, the above availability, etc. I also wonder if in

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread kurt greaves
Great write up. Glad someone finally did the math for us. I don't think this will come as a surprise for many of the developers. Availability is only one issue raised by vnodes. Load distribution and performance are also pretty big concerns. I'm always a proponent for fixing vnodes, and removing

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-16 Thread Joseph Lynch
If the blob link on github doesn't work for the pdf (looks like mobile might not like it), try: https://github.com/jolynch/python_performance_toolkit/raw/master/notebooks/cassandra_availability/whitepaper/cassandra-availability-virtual.pdf -Joey

Quantifying Virtual Node Impact on Cassandra Availability

2018-04-16 Thread Joseph Lynch
Josh Snyder and I have been working on evaluating virtual nodes for large scale deployments and while it seems like there is a lot of anecdotal support for reducing the vnode count [1], we couldn't find any concrete math on the topic, so we had some fun and took a whack at quantifying how