On Tue, Mar 20, 2012 at 9:08 AM, Eric Evans <eev...@acunu.com> wrote: > On Tue, Mar 20, 2012 at 8:39 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >> I like this idea. It feels like a good 80/20 solution -- 80% of the >> benefits, 20% of the effort. More like 5% of the effort. I can't >> even enumerate all the places full vnode support would change, but an >> "active token range" concept would be relatively limited in scope. > > It only addresses 1 of Sam's original 5 points, so I wouldn't call it > an "80% solution".
I guess a more accurate way to put this is, "only 20% of Sam's list is an actual pain point that doesn't get addressed by The Rick Proposal [TRP]." Here's how I see Sam's list: * Even load balancing when growing and shrinking the cluster Nice to have, but post-bootstrap load balancing works well in practice (and is improved by TRP). * Greater failure tolerance in streaming Directly addressed by TRP. * Evenly distributed impact of streaming operations Not a problem in practice with stream throttling. * Possibility for active load balancing Not really a feature of vnodes per se, but as with the other load balancing point, this is also improved by TRP. * Distributed rebuild This is the 20% that TRP does not address. Nice to have? Yes. Can I live without it? I have so far. Is this alone worth the complexity of vnodes? No, it is not. Especially since there are probably other approaches that we can take to mitigate this, one of which Rick has suggested in a separate sub-thread. >> Full vnodes feels a lot more like the counters quagmire, where >> Digg/Twitter worked on it for... 8? months, and then DataStax worked >> on it about for about 6 months post-commit, and we're still finding >> the occasional bug-since-0.7 there. With the benefit of hindsight, as >> bad as maintaining that patchset was out of tree, committing it as >> early as we did was a mistake. We won't do that again. (On the >> bright side, git makes maintaining such a patchset easier now.) > > And yet counters have become a very important feature for Cassandra; > We're better off with them, than without. False dichotomy (we could have waited for a better counter design), but that's mostly irrelevant to my point that jamming incomplete code in-tree to sort out later is a bad idea. > I think there were a number of problems with how counters went down > that could be avoided here. For one, we can take a phased, > incremental approach, rather than waiting 8 months to drop a large > patchset. If there are incremental improvements to be made that justify themselves independently, then I agree. Small, self-contained steps are a good thing. A good example is https://issues.apache.org/jira/browse/CASSANDRA-2319, a product of The Grand Storage Engine Redesign of 674 fame. But, when things don't naturally break down into such mini-features, then I'm -1 on committing code that has no purpose other than to be a foundation for later commits. I've seen people get bored or assigned to other projects too often to just trust that those later commits will indeed be forthcoming. Or even if Sam [for instance] is still working hard on it, it's very easy for unforseen difficulties to come up that invalidate the original approach. Since we were talking about counters, the original vector clock approach -- that we ended up ripping out, painfully -- is a good example. Once bitten, twice shy. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com