On Thu, Jun 16, 2011 at 1:05 PM, AJ <a...@dude.podzone.net> wrote: > On 6/16/2011 10:58 AM, Dan Hendry wrote: >> >> I think this would add a lot of complexity behind the scenes and be >> conceptually confusing, particularly for new users. > > I'm not so sure about this. Cass is already somewhat sophisticated and I > don't see how this could trip-up anyone who can already grasp the basics. > The only thing I am adding to the CL concept is the concept of available > replication nodes, versus total replication nodes. But, don't forget; a > competitor to Cass is probably in the works this very minute so constant > improvement is a good thing.
There are already many competitors. >> The Cassandra consistency model is pretty elegant and this type of >> approach breaks that elegance in many ways. It would also only really be >> useful when the value has a high probability of being updated between a node >> going down and the value being read. > > I'm not sure what you mean. A node can be down for days during which time > the value can be updated. The intention is to use the nodes available even > if they fall below the RF. If there is only 1 node available for accepting > a replica, that should be enough given the conditions I stated and updated > below. If this is your constraint, then you should just use CL.ONE. >> Perhaps the simpler approach which is fairly trivial and does not require >> any Cassandra change is to simply downgrade your read from ALL to QUORUM >> when you get an unavailable exception for this particular read. > > It's not so trivial, esp since you would have to build that into your client > at many levels. I think it would be more appropriate (if this idea > survives) to put it into Cass. >> >> I think the general answerer for 'maximum consistency' is QUORUM >> reads/writes. Based on the fact you are using CL=ALL for reads I assume you >> are using CL=ONE for writes: this itself strikes me as a bad idea if you >> require 'maximum consistency for one critical operation'. >> > Very true. Specifying quorum for BOTH reads/writes provides the 100% > consistency because of the overlapping of the availability numbers. But, > only if the # of available nodes is not < RF. No, it will work as long as the available nodes is >= RF/2 + 1 > Upon further reflection, this idea can be used for any consistency level. > The general thrust of my argument is: If a particular value can be > overwritten by one process regardless of it's prior value, then that implies > that the value in the down node is no longer up-to-date and can be > disregarded. Just work with the nodes that are available. > > Actually, now that I think about it... > > ALL_AVAIL guarantees 100% consistency iff the latest timestamp of the value >> latest unavailability time of all unavailable replica nodes for that > value's row key. Unavailable is defined as a node's Cass process that is > not reachable from ANY node in the cluster in the same data center. If the > node in question is available to at least one node, then the read should > fail as there is a possibility that the value could have been updated some > other way. Node A can't reliably and consistently know whether node B and node C can communicate. > After looking at the code, it doesn't look like it will be difficult. > Instead of skipping the request for values from the nodes when CL nodes > aren't available, it would have to go ahead and request the values from the > available nodes as usual and then look at the timestamps which it does > anyways and compare it to the latest unavailability time of the relevant > replica nodes. The code that keeps track of what nodes are down simply > records the time it went down. But, I've only been looking at the code for > a few days so I'm not claiming to know everything by any stretch. -ryan