On Tue, Mar 17, 2009 at 8:10 AM, Tim Parkin <[email protected]> wrote:
> I've been thinking about the change in bulk docs behaviour and wanted to > discuss online but it;s difficult to get my thoughts across > conversationally so I've written a little 'article'. I'd love feedback > on it and if we can get some conclusions will write up a final document > about the issues as a wiki page. > > Summary > ======= > > What is this about > ------------------ > > Prior to the 0.9 release, it was possible to make atomic operations > against a local database using the bulk_docs functionality. This allowed > a group of operations to be either all applied without error or conflict > or not applied at all. > > The 0.9 release of CouchDB changed the functionality so that it would > only fail to apply the changes on validation or system error. If the > last change in a group of operations was in conflict (e.g. an out of > date rev because someone had changed a document in the mean time) then > the change would still be applied but a conflict flag would be attached > and a message returned listing which operations were in conflict. > > Why is it an issue > ------------------ > > This feature made it possible to provide simple success/fail wrappers > around operations (for instance, an API call or a web request). The > change means that conflict resolution has to be handled in some way for > any bulk operation. > > Examples of Issues faced due to change in functionality > ======================================================= > > We'll use an example where we have a patient and doctor documents where > the patient holds a reference to some of their doctors data > [Hospital_ID, Name and Surgery] > > Here are a couple of examples of operation and some notes/questions.. > > Simple change of the Doctors details > ==================================== > > If an admin changes the doctors Surgery then the following should happen > > i) Make change to doctor > ii) Make change to any patients referring to doctor > > If step two fails (i.e. changing the patients reference to the doctor) > ---------------------------------------------------------------------- > > We can > > a) rollback the changes > b) ignore the failure > c) reload the patient and try to apply the change again > > step b) isn't realy an option.. we can't have patients being sent to > the wrong surgery. Anyway if we did, how would we let the admin fix > things? > > how do we rollback the change to the doctor? > > If the step one fails (i.e. changing the doctor's surgery) > ---------------------------------------------------------- > > We can > > a) rollback the changes? > b) accept the failure > > If we accept the failure, we have to report back to the user that half > of their changes succeeded. What does the user do then? > > how do we rollback the change to the doctor? > > For both of these examples, the only realistic way we can see of > recovering for the administrator is to roll back the changes and tell > them that their 'change' failed.. > > > Two Patients changing references because a doctors changes > ========================================================== > > A doctor (D) has two patients (P1 and P2) > > If an admin changes the doctors Surgery then the following should happen > > i) Make change to D > ii) Make change to P1 > iii) Make changes to P2 > > > If step iii) fails (i.e. someone has changed P2 in the meantime) > ---------------------------------------------------------------- > > With all_or_nothing false (default) :- > ....................................... > > We have inconsistent data where P2 contains the wrong surgery. We can:- > > a) rollback the changes? > b) accept the failure > > a) The problem we have is that the Doctor change applied successfully, > as did the first Patient change so how do we rollback? > > b) If we accept it, what do we report to the administrator interface? > > with all_or_nothing true :- > ............................ > > We now have a conflict on P2 and we don't know whether it contains our > change or not? (and someone elses legitimate changes may have been > affected) > > we now need a plan on how to resolve this conflict. Because we know > nothing about the previous change that we are conflicting with, the only > way to resolve it is to remove our change (we can't delete someone else > change without unknown repercussions). So how do we remove our change? > > As far as we can see the only consistent way to report this to the > administrator is to revert all changes and report failure.. > > Anyway -- lets see how to deal with accepting the failure in different > places > > Accepting that conflicts exist > ============================== > > Because of the nature of CouchDB we accept that conflicts may exist. > This does not mean that we don't care about minimising users exposure to > these changes. > > Lets think about a possible result of an accepted conflict. > > / r2[r3]---r4[r3] > / > r1 * > \ > \ r3 (failed conflict) > > What we have here is a document which starts at revision 1. > > A change is made creating r2 > > A change is applied to r1 which conflicts, r2 is chosen as the winning > rev and r3 is saved on it as a conflict > > A change is made to r2 to create r4 but the conflict flag still points > at r3 > > If we want to rebase our document using r3 instead of r2, we have to > work out a way to apply r4's changes to r2. > > This could potentially be conflicting (application dependent) or it may > be possible to merge changes (if the changes are across different json > elements and the document doesn't have references > > > Conflicts exist at replication outside of a user interaction > ------------------------------------------------------------- > > If these conflicts happen during replication, then the failure can be > dealt with without affecting users working on a single node. This is an > 'offline' job potentially and the number of occurences of conflicts > should be lower but also confined to a point in time (i.e. when you > replicate). > > > Conflicts exist on a single node because of a single user operation > ------------------------------------------------------------------- > > If the conflicts happend during a simple change, the person making those > changes will have to be informed of the problem and be given the options > to resolve that problem. Most users will only want to see a binary > 'worked/didnt work' result and won't informed enough to deal with the > subtleties of rebasing changsets > > Conflicts during normal operation affect individual user interface views > and will occur at a greater frequency and distributed in time. > > > What this means to dealing with users. > ====================================== > > For most web applications, I would imagine a single user will be dealing > with a single database instance. Because of this conciously chosen > specialisation it would be nice to have the tools available to make > single database instance operations as simple as they need to be. > > Removing the previous bulk docs atomic operation makes these user > interface operations unnescessarily complex. > > Trying to provide a self consistent view and simple user interaction > with this single database instance is fundamentally different than the > eventual consistency and conflict resolution that is required > occasionally on a single node and on database replication. > > The steps that developers will have to take to provide conflict > resolution during user interface transactions (in order to provide user > interface consistency) will probably not be the same steps that they > will have to take to deal with conflict resolution in the general sense > (i.e. background conflict checking and conflict checking at > replication). > > I would like to propose that the old bulk_docs functionality be > reinstated in some way but with enough information for developers to > understand what it actually means in the context of distributed data > (i.e. it is a tool to improve consistency not to guarantee consistency). > > Conclusion > ========== > > My understanding for the reason to exclude bulk_docs is to force people > into dealing with distributed conflict resolution (i.e. to prevent > people from using bulk docs as a crutch or using it as an indicator of > atomicity). However, solving the issues raised here will not mean that > the general conflict resolution problem is also solved. The two problems > are very different as would be the probably technical solutions. > > multiple query operations against a single node instantiated by a user > usually require a success/fail result. Dealing with conflicts during a > user 'request' complicates the writing of couchdb backed user > interfaces significantly. We feel the reintroduction of some form of the > previous bulk_docs functionality (with appropriate caveats and name > convention, etc) is critical to some real world applications and will > provide more real world benefits than philosophical drawbacks. > > Interesting read -- and good examples. But I would argue there are more than philosophical drawbacks. As I understand it, it would mean giving up the replication feature entirely. Forever...or at least as long as bulk-docs are relied upon. There's more to replication than scaling (fault tolerance, for one). If your application absolutely needs transactions, and you can't design around it (e.g. doc-level transactions), you may need another tool for the job -- one not named for a "cluster of unreliable commodity hardware".
