Re: single node atomic bulk_docs operations

Dean Landolt Tue, 17 Mar 2009 06:17:02 -0700

On Tue, Mar 17, 2009 at 8:10 AM, Tim Parkin <[email protected]> wrote:


> I've been thinking about the change in bulk docs behaviour and wanted to
> discuss online but it;s difficult to get my thoughts across
> conversationally so I've written a little 'article'. I'd love feedback
> on it and if we can get some conclusions will write up a final document
> about the issues as a wiki page.
>
> Summary
> =======
>
> What is this about
> ------------------
>
> Prior to the 0.9 release, it was possible to make atomic operations
> against a local database using the bulk_docs functionality. This allowed
> a group of operations to be either all applied without error or conflict
> or not applied at all.
>
> The 0.9 release of CouchDB changed the functionality so that it would
> only fail to apply the changes on validation or system error. If the
> last change in a group of operations was in conflict (e.g. an out of
> date rev because someone had changed a document in the mean time) then
> the change would still be applied but a conflict flag would be attached
> and a message returned listing which operations were in conflict.
>
> Why is it an issue
> ------------------
>
> This feature made it possible to provide simple success/fail wrappers
> around operations (for instance, an API call or a web request). The
> change means that conflict resolution has to be handled in some way for
> any bulk operation.
>
> Examples of Issues faced due to change in functionality
> =======================================================
>
> We'll use an example where we have a patient and doctor documents where
> the patient holds a reference to some of their doctors data
> [Hospital_ID, Name and Surgery]
>
> Here are a couple of examples of operation and some notes/questions..
>
> Simple change of the Doctors details
> ====================================
>
> If an admin changes the doctors Surgery then the following should happen
>
>  i) Make change to doctor
>  ii) Make change to any patients referring to doctor
>
> If step two fails (i.e. changing the patients reference to the doctor)
> ----------------------------------------------------------------------
>
> We can
>
>   a) rollback the changes
>   b) ignore the failure
>   c) reload the patient and try to apply the change again
>
>   step b) isn't realy an option.. we can't have  patients being sent to
>   the wrong surgery. Anyway if we did, how would we let the admin fix
>   things?
>
>   how do we rollback the change to the doctor?
>
> If the step one fails (i.e. changing the doctor's surgery)
> ----------------------------------------------------------
>
> We can
>
>  a) rollback the changes?
>  b) accept the failure
>
> If we accept the failure, we have to report back to the user that half
> of their changes succeeded. What does the user do then?
>
> how do we rollback the change to the doctor?
>
> For both of these examples, the only realistic way we can see of
> recovering for the administrator is to roll back the changes and tell
> them that their 'change' failed..
>
>
> Two Patients changing references because a doctors changes
> ==========================================================
>
> A doctor (D) has two patients (P1 and P2)
>
> If an admin changes the doctors Surgery then the following should happen
>
>    i) Make change to D
>    ii) Make change to P1
>    iii) Make changes to P2
>
>
> If step iii) fails (i.e. someone has changed P2 in the meantime)
> ----------------------------------------------------------------
>
> With all_or_nothing false (default)  :-
> .......................................
>
> We have inconsistent data where P2 contains the wrong surgery. We can:-
>
>    a) rollback the changes?
>    b) accept the failure
>
> a) The problem we have is that the Doctor change applied successfully,
>   as did the first Patient change so how do we rollback?
>
> b) If we accept it, what do we report to the administrator interface?
>
> with all_or_nothing true :-
> ............................
>
> We now have a conflict on P2 and we don't know whether it contains our
> change or not? (and someone elses legitimate changes may have been
> affected)
>
> we now need a plan on how to resolve this conflict. Because we know
> nothing about the previous change that we are conflicting with, the only
> way to resolve it is to remove our change (we can't delete someone else
> change without unknown repercussions). So how do we remove our change?
>
> As far as we can see the only consistent way to report this to the
> administrator is to revert all changes and report failure..
>
> Anyway -- lets see how to deal with accepting the failure in different
> places
>
> Accepting that conflicts exist
> ==============================
>
> Because of the nature of CouchDB we accept that conflicts may exist.
> This does not mean that we don't care about minimising users exposure to
> these changes.
>
> Lets think about a possible result of an accepted conflict.
>
>     / r2[r3]---r4[r3]
>    /
> r1 *
>    \
>     \ r3 (failed conflict)
>
> What we have here is a document which starts at revision 1.
>
> A change is made creating r2
>
> A change is applied to r1 which conflicts, r2 is chosen as the winning
> rev and r3 is saved on it as a conflict
>
> A change is made to r2 to create r4 but the conflict flag still points
> at r3
>
> If we want to rebase our document using r3 instead of r2, we have to
> work out a way to apply r4's changes to r2.
>
> This could potentially be conflicting (application dependent) or it may
> be possible to merge changes (if the changes are across different json
> elements and the document doesn't have references
>
>
> Conflicts exist at replication outside of a user interaction
> -------------------------------------------------------------
>
> If these conflicts happen during replication, then the failure can be
> dealt with without affecting users working on a single node. This is an
> 'offline' job potentially and the number of occurences of conflicts
> should be lower but also confined to a point in time (i.e. when you
> replicate).
>
>
> Conflicts exist on a single node because of a single user operation
> -------------------------------------------------------------------
>
> If the conflicts happend during a simple change, the person making those
> changes will have to be informed of the problem and be given the options
> to resolve that problem. Most users will only want to  see a binary
> 'worked/didnt work' result and won't informed enough to deal with the
> subtleties of rebasing changsets
>
> Conflicts during normal operation affect individual user interface views
> and will occur at a greater frequency and distributed in time.
>
>
> What this means to dealing with users.
> ======================================
>
> For most web applications, I would imagine a single user will be dealing
> with a single database instance. Because of this conciously chosen
> specialisation it would be nice to have the tools available to make
> single database instance operations as simple as they need to be.
>
> Removing the previous bulk docs atomic operation makes these user
> interface operations unnescessarily complex.
>
> Trying to provide a self consistent view and simple user interaction
> with this single database instance is fundamentally different than the
> eventual consistency and conflict resolution that is required
> occasionally on a single node and on database replication.
>
> The steps that developers will have to take to provide conflict
> resolution during user interface transactions (in order to provide user
> interface consistency) will probably not be the same steps that they
> will have to take to deal with conflict resolution in the general sense
> (i.e. background conflict checking and conflict checking at
> replication).
>
> I would like to propose that the old bulk_docs functionality be
> reinstated in some way but with enough information for developers to
> understand what it actually means in the context of distributed data
> (i.e. it is a tool to improve consistency not to guarantee consistency).
>
> Conclusion
> ==========
>
> My understanding for the reason to exclude bulk_docs is to force people
> into dealing with distributed conflict resolution (i.e. to prevent
> people from using bulk docs as a crutch or using it as an indicator of
> atomicity). However, solving the issues raised here will not mean that
> the general conflict resolution problem is also solved. The two problems
> are very different as would be the probably technical solutions.
>
> multiple query operations against a single node instantiated by a user
> usually require a success/fail result. Dealing with conflicts during a
> user 'request' complicates the writing of couchdb backed user
> interfaces significantly. We feel the reintroduction of some form of the
> previous bulk_docs functionality (with appropriate caveats and name
> convention, etc) is critical to some real world applications and will
> provide more real world benefits than philosophical drawbacks.
>
>
Interesting read -- and good examples. But I would argue there are more than
philosophical drawbacks. As I understand it, it would mean giving up the
replication feature entirely. Forever...or at least as long as bulk-docs are
relied upon. There's more to replication than scaling (fault tolerance, for
one). If your application absolutely needs transactions, and you can't
design around it (e.g. doc-level transactions), you may need another tool
for the job -- one not named for a "cluster of unreliable commodity
hardware".

Re: single node atomic bulk_docs operations

Reply via email to