On Feb 8, 2009, at 11:27 PM, Antony Blakey wrote:
On 09/02/2009, at 2:35 PM, Paul Davis wrote:
There is no concept of an "MVCC boundary" anywhere in the code that
I'm aware of.
Database updates create an MVCC commit, reads are all wrt an MVCC
commit. MVCC boundaries e.g. commit points, are a fundamental port
of the Couch low-level architecture. When _bulk_docs was ACID, they
were exposed in the user-level API.
I think the bigger point here is that what you're asking for violates
a huge swath of assumptions baked into the core of CouchDB. Asking
CouchDB to do consistent inter-document writes is going to require
you
to either change a large amount of internal code or write some very
specific app code to get what you want.
But it already did consistent inter-document writes - the removal of
that is what this discussion is about.
You may be able to get atomic
interdocument updates on a single node, but this is violated if you
do
so much as try and replicate.
And 'so much as try and replicate' is the issue, because the
replication model varies for different use cases. In my previous
posts you'll see that I'm promoting the idea that the local,
exclusive-replication use-case is significant, and useful. The are
useful models where replication is a fundamentally different
operation than local use.
IMO, it would be better to not support _bulk_docs for exactly this
reason. People that use _bulk_docs will end up assuming that the
atomic properties will carry over into places it doesn't actually get
passed on to.
But it can for local operations, and replications conflicts can be
dealt with separately from normal operation.
It occurs to me that once you get to the point of writing source and
target database locking, you no longer need _bulk_docs. You'd have
enough code to do all the atomic interdoc writes you need.
Only by giving up all local concurrency. Locking is only wrt.
replication vs. local operation. And I think the most recent emails
are showing that source locking is not as black-and-white as you
think - it's only wrt compaction, and even then I think it's
restricted to a requirement to no compact past the MVCC state being
used by the replication process, which IMO is a trivial issue
because compaction cannot invalidate the head MVCC state, and
replication request will always use the head state in effect at
request-time.
Though it'd
be rather un-couchy.
CouchDB has wide applicability, and what you regard as un-couchy is
only relative to a certain use-case. I'm trying to promote a more
generous interpretation of what CouchDB is, and can be.
I see the critical problem being with consistent updates of
replication. Unless you do it one big transaction, the intermediate
replication states of the database are inconsistent, so the target
database is unusable during replication. A bulk transaction is limited
in how many docs it can handle, so it only works for smallish
databases. That alone means MVCC replication isn't useful in the
general case.
But for your purposes, it's maybe possible. You'll need to write a
special replicator and create a single HTTP request to give you
everything from the source in one go. Then you'll need to write-lock
(or disable) the target database during replication, unless they are
always small databases in which case the special replicator can use a
single bulk transaction.
You'll also need to serialize the updates to the database in the
application layer and add the conflict checking there. That will give
you the desired transaction semantics.
If you build what I've described, and assuming you can live with the
limitations, you will have a always consistent one-way replication
document distribution platform.
-Damien
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Human beings, who are almost unique in having the ability to learn
from the experience of others, are also remarkable for their
apparent disinclination to do so.
-- Douglas Adams