Re: [DISCUSS] couchdb 4.0 transactional semantics

Adam Kocoloski Tue, 14 Jul 2020 11:57:29 -0700

Technically, we could certainly terminate a response cleanly when the 
underlying FoundationDB transaction expires and offer a bookmark to resume the 
response using a new transaction in a subsequent request. Some of us have 
reservations about that approach because an application that did not know to 
look for the “txn_too_long” attribute would quietly proceed with an incomplete, 
corrupted dataset. Terminating the response brutally reduces the likelihood of 
that occurring to ~zero.


It’s true that we can’t absolutely guarantee that the database will never 
timeout, but setting a reasonable limit of ~2000 rows in a response should make 
it quite unlikely. I‘d expect those responses be delivered in 50ms or less, 
which is 100x faster than the 5 second transaction expiry.

For cases where you’re not concerned about the snapshot isolation (e.g. 
streaming an entire _changes feed), there is a small performance benefit to 
requesting a new FDB transaction asynchronously before the old one actually 
times out and swapping over to it. That’s a pattern I’ve seen in other FDB 
layers but I’m not sure we’ve used it anywhere in CouchDB yet.

Adam

> On Jul 14, 2020, at 2:06 PM, San Sato <sans...@inator.biz> wrote:
> 
> Interesting.
> 
> 1. end the response  ("uncleanly") - does this mean the HTTP response
> wouldn't be valid JSON?  I guess the HTTP response code can't be expected
> to reflect a non-normal result.  Maybe in a trailing attribute in json, can
> the response indicate that it's truncated for the reason of txn_too_long,
> to distinguish it from completed responses with less-than-a-page-size (e.g.
> limit=20k, 18k records sent, no more records present)?
> 
> Can bookmark/etc still be included at that point to resume in
> closest-key-order?  Even though it's a streaming, not paginated, response,
> it would match pre-v4 semantics of pagination over multiple http requests,
> right?
> 
> 2. Sending a 400 error seems like a good way to waste requests.  I imagine
> there's no constant limit= that can avoid the issue, so people will have to
> do things that are sensitive to the presence of the limit one way or
> other.  I'd way rather get a partial response with a flag indicating I
> should resume from <x> - but maybe that's the "rewrite the app" scenario
> Nick described designing to avoid.
> 
> 4.  request-level isolation=(TRUE|false) could be a way to express default
> a preference for 1, but allow opt-in for streaming the fresher rows.  I'd
> want to be able to know what kind of boundaries are used for switching to a
> newer txn snapshot - obviously there's a practical outer limit from FDB
> but is it a performance hit to switch with some greater frequency like
> 1000-rows, an FDB index-page-size if there's such a thing, every 250ms or
> similar?
> 
> 
> 
> On Tue, Jul 14, 2020 at 7:18 AM Robert Samuel Newson <rnew...@apache.org>
> wrote:
> 
>> Thanks Nick, very helpful, and it vindicates me opening this thread.
>> 
>> I don't accept Mike Rhodes argument at all but I should explain why I
>> don't;
>> 
>> In CouchDB 1.x, a response was generated from a single .couch file. There
>> was always a window between the start of the request as the client sees it
>> and CouchDB acquiring a snapshot of the relevant database. I don't think
>> that gap is meaningful and does not refute our statements of the time that
>> CouchDB responses are from a snapshot (specifically, that no change to the
>> database made _during_ the response will be visible in _this_ response). In
>> CouchDB 2.x (and continuing in 3.x), a CouchDB database typically consists
>> of multiple shards, each of which, once opened, remain snapshotted for the
>> duration of that response. The difference between 1.x and 2.x/3.x is that
>> the window is potentially larger (though the requests are issued in
>> parallel). The response, however much it returned, was impervious to
>> changes in other requests once it has begun.
>> 
>> I don't think _all_docs, _view or a non-continuous _changes response
>> should allow changes made in other requests to appear midway through them
>> and I want to hear the opinions of folks that have watched over CouchDB
>> from its earliest days on this specific point (If I must name names, at
>> least Adam K, Paul D, Jan L, Joan T). If there's a majority for deviating
>> from this semantic, I will go with the majority.
>> 
>> If we were to agree to preserve the 'single snapshot' behaviour, what
>> would the behaviour be if we can't honour it because of the FoundationDB
>> transaction limits?
>> 
>> I see a few options.
>> 
>> 1) We could end the response uncleanly, mid-response. CouchDB does this
>> when it has no alternative, and it is ugly, but it is usually handled well
>> by clients. They are at least not usually convinced they got a complete
>> response if they are using a competent HTTP client.
>> 
>> 2) We could disavow the streaming API, as you've suggested, attempt to
>> gather the full response. If we do this within the FDB bounds, return a 200
>> code and the response body. A 400 and an error body if we don't.
>> 
>> 3) We could make the "limit" parameter mandatory and with an upper bound,
>> in combination with 1 or 2, such that a valid request is very likely to be
>> served within the limits.
>> 
>> I'd like to hear more voices on which way we want to break the
>> unachievable semantic of old where you could read _all_docs on a billion
>> document database over, uptime gods willing, a snapshot of the database.
>> 
>> B.
>> 
>>> On 13 Jul 2020, at 21:15, Nick Vatamaniuc <vatam...@gmail.com> wrote:
>>> 
>>> Thanks for bringing the topic up for the discussion!
>>> 
>>> For background, this topic was discussed on the mailing list starting
>>> in February, 2019
>>> 
>> https://lists.apache.org/thread.html/r02cee7045cac4722e1682bb69ba0ec791f5cce025597d0099fb34033%40%3Cdev.couchdb.apache.org%3E
>>> 
>>> The primary reason for restart_tx option is to provide compatibility
>>> for _changes feeds to allow older replicators to handle 4.0 sources.
>>> It starts a new transaction after 5 seconds or so (a current FDB
>>> limitation, might go up in the future) and transparently continues to
>>> stream data where it left off. Ex, streaming [a,b,c,d], times out
>>> after b, then it will continue with c, d etc. Currently this is also
>>> used for other streaming APIs as an alternative to returning mangled
>>> JSON after emitting a 200 response and streaming some of the rows.
>>> However it is not used for paginated responses, the new APIs developed
>>> by Ilya. So users have an option to get the guaranteed snapshot
>>> behavior option as well.
>>> 
>>> And for completeness, if we decide to remove the option, we should
>>> specify what happens if we remove it and get a transaction_too_old
>>> exception. Currently the behavior would be to restart the transaction,
>>> resend all the headers and all the rows again down the socket, which I
>>> don't think anyone wants, but is what we'd get if we just make
>>> {restart_tx, false}
>>> 
>>>> I understand that automatically resetting the FDB txn during a response
>> is an attempt to work around that and maintain "compatibility" with CouchDB
>> < 4 semantics. I think it fails to do so and is very misleading.
>>> 
>>> It is a trade-off in order to keep the same API shape as before. Sure,
>>> streaming all the docs with _all_docs or _changes feeds is not a great
>>> pattern but many applications are implemented that way already.
>>> Letting them migrate to 4.0 without having to rewrite the application
>>> with the caveat that they might see a document updated in the
>>> _all_docs stream after the request has already started, is a nicer
>>> choice, I think, than forcing them to rewrite their application, which
>>> could lead to a python 2/3 scenario.
>>> 
>>> Due to having multiple shards (Q>1), as discussed in the original
>>> mailing thread by Mike
>>> (
>> https://lists.apache.org/thread.html/r8345f534a6fa88c107c1085fba13e660e0e2aedfd206c2748e002664%40%3Cdev.couchdb.apache.org%3E
>> ),
>>> we don't provide a strict read-only snapshot guarantee in 2.x and 3.x
>>> anyway, so users would have to handle scenarios where a document might
>>> appear in the stream that wasn't there at the start of the request
>>> already. Though, granted, a much smaller corner case but I wonder how
>>> many users care to handle that...
>>> 
>>> Currently users do have an option of using the new paginated API which
>>> disables restart_tx behavior
>>> 
>> https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/chttpd/src/chttpd_db.erl#L947
>> ,
>>> though I am not sure what happens when transaction_too_old exception
>>> is thrown then (emit a bookmark?)
>>> 
>>> So based on the compatibility consideration, I'd vote to keep the
>>> restart_tx option (configurable perhaps if we figure out what to do
>>> when it is disabled) in order to allow users to migrate their
>>> application to 4.0. At least informally we promised users to keep a
>>> strong API compatibility when we released 3.0 with an eye towards 4.0
>>> (https://blog.couchdb.org/2020/02/26/the-road-to-couchdb-3-0/). I'd
>>> think not emitting all the data in a _changes or _all_docs response
>>> would break that compatibility more than using multiple transactions.
>>> 
>>> As for what happens when a transaction_too_old is thrown, I could see
>>> an option passed in, something like, single_snapshot=true, and then
>>> use Adam's suggestion to accumulate all the rows in memory and if we
>>> hit the end of the transaction return a 400 error. We won't emit
>>> anything out while rows are accumulated, so users don't get partial
>>> data, it will be every row requested or a 400 error (so no chance of
>>> perceived data loss). Users may retry if they think it was a temporary
>>> hiccup or may use a small limit number.
>>> 
>>> Cheers,
>>> -Nick
>>> 
>>> On Mon, Jul 13, 2020 at 2:05 PM Robert Samuel Newson <rnew...@apache.org>
>> wrote:
>>>> 
>>>> Hi All,
>>>> 
>>>> I'm concerned to see the restart_fold function in fabric2_fdb (
>> https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_fdb.erl#L1828)
>> in the 4.0 development branch.
>>>> 
>>>> The upshot of doing this is that a CouchDB response could be taken
>> across multiple snapshots of the database, which is not the behaviour of
>> CouchDB 1 through 3.
>>>> 
>>>> I don't think this is ok (with the obvious and established exception of
>> a continuous changes feed, where new snapshots are continuously visible at
>> the end of the response).
>>>> 
>>>> FoundationDB imposes certain limits on transactions, the most notable
>> being the 5 second maximum duration. I understand that automatically
>> resetting the FDB txn during a response is an attempt to work around that
>> and maintain "compatibility" with CouchDB < 4 semantics. I think it fails
>> to do so and is very misleading.
>>>> 
>>>> Discuss.
>>>> 
>>>> B.
>>>> 
>> 
>>

Re: [DISCUSS] couchdb 4.0 transactional semantics

Reply via email to