Re: Where to add documentation for bulk updates

Brian Candler Wed, 25 Mar 2009 01:47:05 -0700

On Tue, Mar 24, 2009 at 09:03:55AM -0700, David Van Couvering wrote:
> Thanks for fixing this, Brian.  I'm not sure I'm totally happy with these
> semantics.  Unless I am missing something (more than possible as I'm still
> learning CouchDB), for a bulk update with N documents, you would have to do
> 1 round-trip for the update and N round-trips to check for conflicts (or, if
> not using all-or-nothing, N round-trips to check and see if the update was
> successful or not).


Unfortunately, as far as I can see the _all_docs view doesn't support
?conflicts=true. If it did, a single round trip could retrieve the conflict
status of all the documents of interest.

However, if you are doing this, it suggests you are doing something wrong,
because you will almost certainly end up with race conditions. For example:
you could retrieve the _all_docs and find that none of them are in conflict
- hooray! - but one millisecond later someone may replicate from another
database which will introduce new conflicts. You have no control over this
in the flow of the first part of your application.

So you really need a strategy for dealing with conflicts which is
asynchronous to application updates. For example:

- every time you *read* a document, check for conflicts and resolve them at
  that point in time

- have a periodic sweep for conflicts (using a view to find them) and
  resolve all those found.

I think only the first is going to give suitable semantics for end-users of
traditional database-style applications. Unless you do this, users are going
to observe updated versions of documents simply "vanishing" from the
database after replication (since a different, conflicting version is taking
precedence over the first one), to reappear at some future point in time.
They will cry "bug".

However, performing this resolution on every single read is a complete PITA
for the client side to implement, and easy to forget. To centralise and
enforce this you really need some sort of proxy layer between the client and
the database so that every doc read and view read has this logic performed.

This is where I think the introductory material on couchdb.apache.org does
CouchDB a big disservice by overhyping: it implies strongly that
distributed, replicated applications are easy to write with CouchDB, but I
don't think the reality matches that, at least not yet.

Of course, CouchDB is applied successfully to many projects. I have a
suspicion that most of them either (a) don't use replication at all, or (b)
replicate mainly in a master->slave fashion, or (c) follow "append-only"
pattern (new documents are added, but old documents are rarely modified). In
these cases, the conflict issue is dodged entirely, which means CouchDB's
built-in "support" for replication conflict handling is moot.

> Isn't there any way for the response to a bulk update to tell you which
> documents have conflicts or failures?

Sure. In the default operation of POST to _bulk_docs, you will get
individual responses of success or fail for each document, and no conflicts
will be introduced. This is exactly the same as PUTing each document
individually, but at a lower HTTP overhead.

I'm not sure who will use the new "all_or_nothing":true, or why. Presumably
it's possible to build some useful transactional semantics on top of this,
with care. Maybe we will have to buy the book to find out :-)

Regards,

Brian.

Re: Where to add documentation for bulk updates

Reply via email to