On Tue, Mar 24, 2009 at 09:03:55AM -0700, David Van Couvering wrote: > Thanks for fixing this, Brian. I'm not sure I'm totally happy with these > semantics. Unless I am missing something (more than possible as I'm still > learning CouchDB), for a bulk update with N documents, you would have to do > 1 round-trip for the update and N round-trips to check for conflicts (or, if > not using all-or-nothing, N round-trips to check and see if the update was > successful or not).
Unfortunately, as far as I can see the _all_docs view doesn't support ?conflicts=true. If it did, a single round trip could retrieve the conflict status of all the documents of interest. However, if you are doing this, it suggests you are doing something wrong, because you will almost certainly end up with race conditions. For example: you could retrieve the _all_docs and find that none of them are in conflict - hooray! - but one millisecond later someone may replicate from another database which will introduce new conflicts. You have no control over this in the flow of the first part of your application. So you really need a strategy for dealing with conflicts which is asynchronous to application updates. For example: - every time you *read* a document, check for conflicts and resolve them at that point in time - have a periodic sweep for conflicts (using a view to find them) and resolve all those found. I think only the first is going to give suitable semantics for end-users of traditional database-style applications. Unless you do this, users are going to observe updated versions of documents simply "vanishing" from the database after replication (since a different, conflicting version is taking precedence over the first one), to reappear at some future point in time. They will cry "bug". However, performing this resolution on every single read is a complete PITA for the client side to implement, and easy to forget. To centralise and enforce this you really need some sort of proxy layer between the client and the database so that every doc read and view read has this logic performed. This is where I think the introductory material on couchdb.apache.org does CouchDB a big disservice by overhyping: it implies strongly that distributed, replicated applications are easy to write with CouchDB, but I don't think the reality matches that, at least not yet. Of course, CouchDB is applied successfully to many projects. I have a suspicion that most of them either (a) don't use replication at all, or (b) replicate mainly in a master->slave fashion, or (c) follow "append-only" pattern (new documents are added, but old documents are rarely modified). In these cases, the conflict issue is dodged entirely, which means CouchDB's built-in "support" for replication conflict handling is moot. > Isn't there any way for the response to a bulk update to tell you which > documents have conflicts or failures? Sure. In the default operation of POST to _bulk_docs, you will get individual responses of success or fail for each document, and no conflicts will be introduced. This is exactly the same as PUTing each document individually, but at a lower HTTP overhead. I'm not sure who will use the new "all_or_nothing":true, or why. Presumably it's possible to build some useful transactional semantics on top of this, with care. Maybe we will have to buy the book to find out :-) Regards, Brian.
