For at least my part of it, I agree with Adam. Bigcouch has made an effort to inform in the case of a failure to apply W. I've seen it lead to confusion when the same logic was not applied on R.
I also agree that W and R are not binding contracts. There's no agreement protocol to assure that W is met before being committed to disk. But they are exposed as a blocking parameter of the request, so notification being consistent appeared to me to be the best compromise (vs straight up removal). </JamesM> > On Mar 31, 2015, at 13:15, Robert Newson <[email protected]> wrote: > > > If a way can be found that doesn't break things that can be sent in all or > most cases, sure. It's what a user can really infer from that which I focused > on. Not as much, I think, as users that want that info really want. > > >> On 31 Mar 2015, at 21:08, Adam Kocoloski <[email protected]> wrote: >> >> I hope we can all agree that CouchDB should inform the user when it is >> unable to satisfy the requested read "quorum". >> >> Adam >> >>> On Mar 31, 2015, at 3:20 PM, Paul Davis <[email protected]> wrote: >>> >>> Sounds like there's a bit of confusion here. >>> >>> What Nathan is asking for is the ability to have Couch respond with some >>> information on the actual number of replicas that responded to a read >>> request. That way a user could tell that they issued an r=2 request when >>> only r=1 was actually performed. Depending on your point of view in an MVCC >>> world this is either a bug or a feature. :) >>> >>> It was generally agreed upon that if we could return this information it >>> would be beneficial. Although what happened when I started implementing >>> this patch was that we are either only able to return it in a subset of >>> cases where it happens, return it inconsistently between various responses, >>> or break replication. >>> >>> The three general methods for this would be to either include a new >>> "_r_met" key in the doc body that would be a boolean indicating if the >>> requested read quorum was actually met for the document. The second was to >>> return a custom X-R-Met type header, and lastly was the status code as >>> described. >>> >>> The _r_met member was thought to be the best, but unfortunately that breaks >>> replication with older clients because we throw an error rather than ignore >>> any unknown underscore prefixed field name. Thus having something that was >>> just dynamically injected into the document body was a non-starter. >>> Unfortunately, if we don't inject into the document body then we limit >>> ourselves to only the set of APIs where a single document is returned. This >>> is due to both streaming semantics (we can't buffer an entire response in >>> memory for large requests to _all_docs) as well as multi-doc responses (a >>> single boolean doesn't say which document may have not had a properly met >>> R). >>> >>> On top of that, the other confusing part of meeting the read quorum is that >>> given MVCC semantics it becomes a bit confusing on how you respond to >>> documents with different revision histories. For instance, if we read two >>> docs, we have technically made the r=2 requirement, but what should our >>> response be if those two revisions are different (technically, in this case >>> we wait for the third response, but the decision on what to return for the >>> "r met" value is still unclear). >>> >>> While I think everyone is in agreement that it'd be nice to return some of >>> the information about the copies read, I think its much less clear what and >>> how it should be returned in the multitude of cases that we can specify an >>> value for R. >>> >>> While that doesn't offer a concrete path forward, hopefully it clarifies >>> some of the issues at hand. >>> >>> On Tue, Mar 31, 2015 at 1:47 PM, Robert Samuel Newson <[email protected]> >>> wrote: >>> >>>> >>>> It’s testament to my friendship with Mike that we can disagree on such >>>> things and remain friends. I am sorry he misled you, though. >>>> >>>> CouchDB 2.0 (like Cloudant) does not have read or write quorums at all, at >>>> least in the formal sense, the only one that matters, this is unfortunately >>>> sloppy language in too many places to correct. >>>> >>>> The r= and w= parameters control only how many of the n possible responses >>>> are collected before returning an http response. >>>> >>>> It’s not true that returning 202 in the situation where one write is made >>>> but fewer than 'r' writes are made means we’ve chosen availability over >>>> consistency since even if we returned a 500 or closed the connection >>>> without responding, a subsequent GET could return the document (a >>>> probability that increases over time as anti-entropy makes the missing >>>> copies). A write attempt that returned a 409 could, likewise, introduce a >>>> new edit branch into the document, which might then 'win', altering the >>>> results of a subsequent GET. >>>> >>>> The essential thing to remember is this: the ’n’ copies of your data are >>>> completely independent when written/read by the clustered layer (fabric). >>>> It is internal replication (anti-entropy) that converges those copies, >>>> pair-wise, to the same eventual state. Fabric is converting the 3 >>>> independent results into a single result as best it can. Older versions did >>>> not expose the 201 vs 202 distinction, calling both of them 201. I do agree >>>> with you that there’s little value in the 202 distinction. About the only >>>> thing you could do is investigate your cluster for connectivity issues or >>>> overloading if you get a sustained period of 202’s, as it would be an >>>> indicator that the system is partitioned. >>>> >>>> In order to achieve your goals, CouchDB 2.0 would have to ensure that the >>>> result of a write did not change after the fact. That is, anti-entropy >>>> would need to be disabled, or somehow agree to roll forward or backward >>>> based on the initial circumstances. In short, we’d have to introduce strong >>>> consistency (paxos or raft or zab, say). While this would be a great >>>> feature to add, it’s not currently present, and no amount of twiddling the >>>> status codes will achieve it. We’d rather be honest about our position on >>>> the CAP triangle. >>>> >>>> B. >>>> >>>> >>>>>> On 30 Mar 2015, at 22:37, Nathan Vander Wilt <[email protected]> >>>>> wrote: >>>>> >>>>> A technical co-founder of Cloudant agreed that this was a bug when I >>>> first hit it a few years ago. I found back the original thread here — this >>>> is the discussion I was trying to recall in my OP: >>>>> It sounds like perhaps there is a related issue tracked internally at >>>> Cloudant as a result of that conversation. >>>>> >>>>> JamesM, thanks for your support here and tracking this down. 203 seemed >>>> like the best status code to "steal" for this to me too. Best wishes in >>>> getting this fixed! >>>>> >>>>> regards, >>>>> -natevw >>>>> >>>>> >>>>>> On Mar 25, 2015, at 4:49 AM, Robert Newson <[email protected]> wrote: >>>>>> >>>>>> 2.0 is explicitly an AP system, the behaviour you describe is not >>>> classified as a bug. >>>>>> >>>>>> Anti-entropy is the main reason that you cannot get strong consistency >>>> from the system, it will transform "failed" writes (those that succeeded on >>>> one node but fewer than R nodes) into success (N copies) as long as the >>>> nodes have enough healthy uptime. >>>>>> >>>>>> True of cloudant and 2.0. >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On 24 Mar 2015, at 15:14, Mutton, James <[email protected]> wrote: >>>>>>> >>>>>>> Funny you should mention it. I drafted an email in early February to >>>> queue up the same discussion whenever I could get involved again (which I >>>> promptly forgot about). What happens currently in 2.0 appears unchanged >>>> from earlier versions. When R is not satisfied in fabric, >>>> fabric_doc_open:handle_message eventually responds with a {stop, …} but >>>> leaves the acc-state as the original r_not_met which triggers a read_repair >>>> from the response handler. read_repair results in an {ok, …} with the only >>>> doc available, because no other docs are in the list. The final doc >>>> returned to chttpd_db:couch_doc_open and thusly to chttpd_db:db_doc_req is >>>> simply {ok, Doc}, which has now lost the fact that the answer was not >>>> complete. >>>>>>> >>>>>>> This seems straightforward to fix by a change in >>>> fabric_open_doc:handle_response and read_repair. handle_response knows >>>> whether it has R met and could pass that forward, or allow read-repair to >>>> pass it forward if read_repair is able to satisfy acc.r. I can’t speak for >>>> community interest in the behavior of sending a 202, but it’s something I’d >>>> definitely like for the same reasons you cite. Plus it just seems >>>> disconnected to do it on writes but not reads. >>>>>>> >>>>>>> Cheers, >>>>>>> </JamesM> >>>>>>> >>>>>>>> On Mar 24, 2015, at 14:06, Nathan Vander Wilt < >>>> [email protected]> wrote: >>>>>>>> >>>>>>>> Sorry, I have not been following CouchDB 2.0 roadmap but I was >>>> extending my fermata-couchdb plugin today and realized that perhaps the >>>> Apache release of BigCouch as CouchDB 2.0 might provide an opportunity to >>>> fix a serious issue I had using Cloudant's implementation. >>>>>>>> >>>>>>>> See >>>> https://github.com/cloudant/bigcouch/issues/55#issuecomment-30186518 for >>>> some additional background/explanation, but my understanding is that >>>> Cloudant for all practical purposes ignores the read durability parameter. >>>> So you can write with ?w=N to attempt some level of quorum, and get a 202 >>>> back if that quorum is unment. _However_ when you ?r=N it really doesn't >>>> matter if only <N nodes are available…if even just a single available node >>>> has some version of the requested document you will get a successful >>>> response (!). >>>>>>>> >>>>>>>> So in practice, there's no way to actually use the quasi-Dynamo >>>> features to dynamically _choose_ between consistency or availability — when >>>> it comes time to read back a consistent result, BigCouch instead just >>>> always gives you availability* regardless of what a given request actually >>>> needs. (In my usage I ended up treating a 202 write as a 500, rather than >>>> proceeding with no way of ever knowing whether a write did NOT ACTUALLY >>>> conflict or just hadn't YET because $who_knows_how_many nodes were still >>>> down…) >>>>>>>> >>>>>>>> IIRC, this was both confirmed and acknowledged as a serious bug by a >>>> Cloudant engineer (or support personnel at least) but could not be quickly >>>> fixed as it could introduce backwards-compatibility concerns. So… >>>>>>>> >>>>>>>> Is CouchDB 2.0 already breaking backwards compatibility with >>>> BigCouch? If true, could this read durability issue now be fixed during the >>>> merge? >>>>>>>> >>>>>>>> thanks, >>>>>>>> -natevw >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> * DISCLAIMER: this statement has not been endorsed by actual uptime >>>> of *any* Couch fork… >>
