Interesting. 1. end the response ("uncleanly") - does this mean the HTTP response wouldn't be valid JSON? I guess the HTTP response code can't be expected to reflect a non-normal result. Maybe in a trailing attribute in json, can the response indicate that it's truncated for the reason of txn_too_long, to distinguish it from completed responses with less-than-a-page-size (e.g. limit=20k, 18k records sent, no more records present)?
Can bookmark/etc still be included at that point to resume in closest-key-order? Even though it's a streaming, not paginated, response, it would match pre-v4 semantics of pagination over multiple http requests, right? 2. Sending a 400 error seems like a good way to waste requests. I imagine there's no constant limit= that can avoid the issue, so people will have to do things that are sensitive to the presence of the limit one way or other. I'd way rather get a partial response with a flag indicating I should resume from <x> - but maybe that's the "rewrite the app" scenario Nick described designing to avoid. 4. request-level isolation=(TRUE|false) could be a way to express default a preference for 1, but allow opt-in for streaming the fresher rows. I'd want to be able to know what kind of boundaries are used for switching to a newer txn snapshot - obviously there's a practical outer limit from FDB but is it a performance hit to switch with some greater frequency like 1000-rows, an FDB index-page-size if there's such a thing, every 250ms or similar? On Tue, Jul 14, 2020 at 7:18 AM Robert Samuel Newson <rnew...@apache.org> wrote: > Thanks Nick, very helpful, and it vindicates me opening this thread. > > I don't accept Mike Rhodes argument at all but I should explain why I > don't; > > In CouchDB 1.x, a response was generated from a single .couch file. There > was always a window between the start of the request as the client sees it > and CouchDB acquiring a snapshot of the relevant database. I don't think > that gap is meaningful and does not refute our statements of the time that > CouchDB responses are from a snapshot (specifically, that no change to the > database made _during_ the response will be visible in _this_ response). In > CouchDB 2.x (and continuing in 3.x), a CouchDB database typically consists > of multiple shards, each of which, once opened, remain snapshotted for the > duration of that response. The difference between 1.x and 2.x/3.x is that > the window is potentially larger (though the requests are issued in > parallel). The response, however much it returned, was impervious to > changes in other requests once it has begun. > > I don't think _all_docs, _view or a non-continuous _changes response > should allow changes made in other requests to appear midway through them > and I want to hear the opinions of folks that have watched over CouchDB > from its earliest days on this specific point (If I must name names, at > least Adam K, Paul D, Jan L, Joan T). If there's a majority for deviating > from this semantic, I will go with the majority. > > If we were to agree to preserve the 'single snapshot' behaviour, what > would the behaviour be if we can't honour it because of the FoundationDB > transaction limits? > > I see a few options. > > 1) We could end the response uncleanly, mid-response. CouchDB does this > when it has no alternative, and it is ugly, but it is usually handled well > by clients. They are at least not usually convinced they got a complete > response if they are using a competent HTTP client. > > 2) We could disavow the streaming API, as you've suggested, attempt to > gather the full response. If we do this within the FDB bounds, return a 200 > code and the response body. A 400 and an error body if we don't. > > 3) We could make the "limit" parameter mandatory and with an upper bound, > in combination with 1 or 2, such that a valid request is very likely to be > served within the limits. > > I'd like to hear more voices on which way we want to break the > unachievable semantic of old where you could read _all_docs on a billion > document database over, uptime gods willing, a snapshot of the database. > > B. > > > On 13 Jul 2020, at 21:15, Nick Vatamaniuc <vatam...@gmail.com> wrote: > > > > Thanks for bringing the topic up for the discussion! > > > > For background, this topic was discussed on the mailing list starting > > in February, 2019 > > > https://lists.apache.org/thread.html/r02cee7045cac4722e1682bb69ba0ec791f5cce025597d0099fb34033%40%3Cdev.couchdb.apache.org%3E > > > > The primary reason for restart_tx option is to provide compatibility > > for _changes feeds to allow older replicators to handle 4.0 sources. > > It starts a new transaction after 5 seconds or so (a current FDB > > limitation, might go up in the future) and transparently continues to > > stream data where it left off. Ex, streaming [a,b,c,d], times out > > after b, then it will continue with c, d etc. Currently this is also > > used for other streaming APIs as an alternative to returning mangled > > JSON after emitting a 200 response and streaming some of the rows. > > However it is not used for paginated responses, the new APIs developed > > by Ilya. So users have an option to get the guaranteed snapshot > > behavior option as well. > > > > And for completeness, if we decide to remove the option, we should > > specify what happens if we remove it and get a transaction_too_old > > exception. Currently the behavior would be to restart the transaction, > > resend all the headers and all the rows again down the socket, which I > > don't think anyone wants, but is what we'd get if we just make > > {restart_tx, false} > > > >> I understand that automatically resetting the FDB txn during a response > is an attempt to work around that and maintain "compatibility" with CouchDB > < 4 semantics. I think it fails to do so and is very misleading. > > > > It is a trade-off in order to keep the same API shape as before. Sure, > > streaming all the docs with _all_docs or _changes feeds is not a great > > pattern but many applications are implemented that way already. > > Letting them migrate to 4.0 without having to rewrite the application > > with the caveat that they might see a document updated in the > > _all_docs stream after the request has already started, is a nicer > > choice, I think, than forcing them to rewrite their application, which > > could lead to a python 2/3 scenario. > > > > Due to having multiple shards (Q>1), as discussed in the original > > mailing thread by Mike > > ( > https://lists.apache.org/thread.html/r8345f534a6fa88c107c1085fba13e660e0e2aedfd206c2748e002664%40%3Cdev.couchdb.apache.org%3E > ), > > we don't provide a strict read-only snapshot guarantee in 2.x and 3.x > > anyway, so users would have to handle scenarios where a document might > > appear in the stream that wasn't there at the start of the request > > already. Though, granted, a much smaller corner case but I wonder how > > many users care to handle that... > > > > Currently users do have an option of using the new paginated API which > > disables restart_tx behavior > > > https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/chttpd/src/chttpd_db.erl#L947 > , > > though I am not sure what happens when transaction_too_old exception > > is thrown then (emit a bookmark?) > > > > So based on the compatibility consideration, I'd vote to keep the > > restart_tx option (configurable perhaps if we figure out what to do > > when it is disabled) in order to allow users to migrate their > > application to 4.0. At least informally we promised users to keep a > > strong API compatibility when we released 3.0 with an eye towards 4.0 > > (https://blog.couchdb.org/2020/02/26/the-road-to-couchdb-3-0/). I'd > > think not emitting all the data in a _changes or _all_docs response > > would break that compatibility more than using multiple transactions. > > > > As for what happens when a transaction_too_old is thrown, I could see > > an option passed in, something like, single_snapshot=true, and then > > use Adam's suggestion to accumulate all the rows in memory and if we > > hit the end of the transaction return a 400 error. We won't emit > > anything out while rows are accumulated, so users don't get partial > > data, it will be every row requested or a 400 error (so no chance of > > perceived data loss). Users may retry if they think it was a temporary > > hiccup or may use a small limit number. > > > > Cheers, > > -Nick > > > > On Mon, Jul 13, 2020 at 2:05 PM Robert Samuel Newson <rnew...@apache.org> > wrote: > >> > >> Hi All, > >> > >> I'm concerned to see the restart_fold function in fabric2_fdb ( > https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_fdb.erl#L1828) > in the 4.0 development branch. > >> > >> The upshot of doing this is that a CouchDB response could be taken > across multiple snapshots of the database, which is not the behaviour of > CouchDB 1 through 3. > >> > >> I don't think this is ok (with the obvious and established exception of > a continuous changes feed, where new snapshots are continuously visible at > the end of the response). > >> > >> FoundationDB imposes certain limits on transactions, the most notable > being the 5 second maximum duration. I understand that automatically > resetting the FDB txn during a response is an attempt to work around that > and maintain "compatibility" with CouchDB < 4 semantics. I think it fails > to do so and is very misleading. > >> > >> Discuss. > >> > >> B. > >> > >