Re: Transactional _bulk_docs

Antony Blakey Fri, 06 Feb 2009 16:01:18 -0800


On 07/02/2009, at 4:47 AM, Jim McCoy wrote:

Returning, as always, to Brewer's CAP conjecture, you can have any two
of consistency, availability, or partition-tolerance.  Scalaris went
with paxos sync and selected consistency and partition-tolerance.
This means that if you lose a majority of your Scalaris nodes it
becomes impossible to log a write to the system. In CouchDB
availability and partition-tolerance were prioritized ("eventual
consistency" being that pleasant term we use to play down the fact
that no consistency assurances can actually be made beyond a
best-effort attempt to fix things after the fact.)

I think we're talking at cross-purposes here. I'm talking abouttransactions over a couch cluster, not over an arbitrary set of peers.The proposed mods to the API are to allow the cluster to present theillusion of a single server even if it's implemented over multiplenodes. I have yet to see the definition of multi-node that hasprompted this change - it's alternately referred to as multi-node andpartitioned, where partitioned means data-partitioning, not nodeconnectivity, but talk about data-partitioning and the fact thatcluster-side code is required, gives some clues.

What distinguishes such a cluster from an arbitrary group of peers?Well, for one, there is some additional server code/functionality thatallows a single entry point to the cluster, and provides the semanticsand appearance of a single server. And the rest of the paragraph yourefer to is:

Why compromise the API for something that is so ephemeral thatconflict management isn't feasible? What IS feasible? Viewconsistency? MVCC semantics. If I write a document and then readit, do I maybe see an earlier document. What about in the view?Because if views are going to have a consistency guarantee wrt.writes, then that looks to me like distributed MVCC. Is this not2PC? And if views aren't consistent, then why bother? Why not havea client layer that does content-directed load balancing.

My point being that when talking about a cluster you probably alreadyhave some consistency constraints above and beyond an arbitrary set ofpeers. So, don't these extra constraints already provide anenvironment where an ACID operation can be provided? And if there areno extra constraints, then I think such a cluster devolves tosomething that could be done using a load balancing layer.

I believe the "unbounded number of servers" that Damien was mentioning
was referring to the idea that with multi-node operations the activity
over which you want an ACID commit could become a very large set if
the number of participating nodes grows significantly.  This would
mean that a large system would require an ever-growing number of
messages to be passed around, which is a scalability bottleneck.

It is simply not possible to offer what you want within CouchDB unless
either availability or partition-tolerance are sacrificed.

I'm making the presumption, on the basis of this discussion referringto data partitioning, that multi-node Couch clusters are not partitiontolerant. As opposed to e.g. p2p replication meshes. It's thedifference between these two scenarios, and the way that differencecan be taken advantage of within Couch applications, that prompts mycomments about the nature of the UI that is presented for local(clustered) vs. distributed operations e.g.

Regardless, this discussion is also about whether supporting asingle-node style of operation is useful, because CouchDB had anACID _bulk_docs. IMO it's also about, the danger/cost ofabstracting over operational models with considerably differentoperational characteristics - c.f. transparent distribution inobject models.


and:

The use-case of someone doing a local operation e.g. submitting aweb form, is very different than resolving replication conflicts.Conflict during a local operation is a matter of applicationconcurrency, whereas conflict during replication is driven by theoverall system model. It has different temporal, administrative andUI boundaries.
In short, I think it is a mistake to try and hide the differentcharacteristics of local (even clustered) operations, andreplication. You may disagree, but if the system distinguishesbetween these two fundamentally different things (distinguished bytheir partition-tolerance), you can code as though every operationleads to conflict if you wish, but I can't take advantage of thedifference if the system doesn't distinguish between those twocases. Distinguishing between the two cases allows for a widerrange of uses and application models.


Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Isn't it enough to see that a garden is beautiful without having tobelieve that there are fairies at the bottom of it too?

  -- Douglas Adams

Re: Transactional _bulk_docs

Reply via email to