On 2020-07-16 4:50 p.m., Joan Touzet wrote:


On 2020-07-16 2:24 p.m., Robert Samuel Newson wrote:

Agreed on all 4 points. On the final point, it's worth noting that a continuous changes feed was two-phase, the first is indeed over a snapshot of the db as of the start of the _changes request, the second phase is an endless series of subsequent snapshots. the 4.0 behaviour won't exactly match that but it's definitely in the same spirit.

Agreed also on requiring pagination (I've not reviewed the proposed pagination api in sufficient detail to +1 it yet). Would we start the response as rows are retrieved, though? That's my preference, with an unclean termination if we hit txn_too_old, and an upper bound on the "limit" parameter or equivalent chosen such that txn_too_old is vanishingly unlikely.

On compatibility, there's precedent for a minor release of old branches just to add replicator compatibility. for example, the replicator could call _changes again if it received a complete _changes response (i.e, one that ended with a } that completes the json object) that did not include a "last_seq" row. The 4.0 replicator would always do this.

I wouldn't really want to release a new 1.x, would you? Augh.

If we're going to change how replication works, wouldn't it better to simply say "there is no guaranteed one-shot replication back from 4.x to 1.x?" Or, intentionally break backward compatibility so one-shot replication to un-upgraded old Couches refuses to work at all? This would prevent the confusion by making it clear - you can't do things this way anymore.

Sorry, meant to say we publish that the workaround is you need either a "push" replication from 4.x -> 1.x, or must use a hypothetically patched 3.x+ replicator as a "third party" to replicate successfully from 4.x -> non-patched older CouchDBs.

I'd rather support this scenario than have to support explaining why the "one shot" replication back to an old 1.x, when initiated by a 1.x cluster, is returning results "ahead" of the time at which the one-shot replication was started.


We could do a point release of 3.x, sure.

-Joan


B.

On 16 Jul 2020, at 17:25, Paul Davis <paul.joseph.da...@gmail.com> wrote:

 From what I'm reading it sounds like we have general consensus on a few things:

1. A single CouchDB API call should map to a single FDB transaction
2. We absolutely do not want to return a valid JSON response to any
streaming API that hit a transaction boundary (because data
loss/corruption)
3. We're willing to change the API requirements so that 2 is not an issue.
4. None of this applies to continuous changes since that API call was
never a single snapshot.

If everyone generally agrees with that summarization, my suggestion
would be that we just revisit the new pagination APIs and make them
the only behavior rather than having them be opt-in. I believe those
APIs already address all the concerns in this thread and the only
reason we kept the older versions with `restart_tx` was to maintain
API backwards compatibility at the expense of a slight change to
semantics of snapshots. However, if there's a consensus that the
semantics are more important than allowing a blanket `GET
/db/_all_docs` I think it'd make the most sense to just embrace the
pagination APIs that already exist and were written to cover these
issues.

The only thing I'm not 100% on is how to deal with non-continuous
replications. I.e., the older single shot replication. Do we go back
with patches to older replicators to allow 4.0 compatibility? Just
declare that you have to mediate a replication on the newer of the two
CouchDB deployments? Sniff the replicator's UserAgent and behave
differently on 4.x for just that special case?

Paul

On Wed, Jul 15, 2020 at 7:25 PM Adam Kocoloski <kocol...@apache.org> wrote:

Sorry, I also missed that you quoted this specific bit about eagerly requesting a new snapshot. Currently the code will just react to the transaction expiring, then wait till it acquires a new snapshot if “restart_tx” is set (which can take a couple of milliseconds on a FoundationDB cluster that is deployed across multiple AZs in a cloud Region) and then proceed.

Adam

On Jul 15, 2020, at 6:54 PM, Adam Kocoloski <kocol...@apache.org> wrote:

Right now the code has an internal “restart_tx” flag that is used to automatically request a new snapshot if the original one expires and continue streaming the response. It can be used for all manner of multi-row responses, not just _changes.

As this is a pretty big change to the isolation guarantees provided by the database Bob volunteered to elevate the issue to the mailing list for a deeper discussion.

Cheers, Adam

On Jul 15, 2020, at 11:38 AM, Joan Touzet <woh...@apache.org> wrote:

I'm having trouble following the thread...

On 14/07/2020 14:56, Adam Kocoloski wrote:
For cases where you’re not concerned about the snapshot isolation (e.g. streaming an entire _changes feed), there is a small performance benefit to requesting a new FDB transaction asynchronously before the old one actually times out and swapping over to it. That’s a pattern I’ve seen in other FDB layers but I’m not sure we’ve used it anywhere in CouchDB yet.

How does _changes work right now in the proposed 4.0 code?

-Joan



Reply via email to