Re: [DISCUSS] couchdb 4.0 transactional semantics

Nick Vatamaniuc Fri, 24 Jul 2020 10:28:24 -0700

Great discussion everyone!

For normal replications, I think it might be nice to make an exception
and allow server-side pagination for compatibility at first, with a
new option to explicitly enable strict snapshots behavior. Then, in a
later release make it the default to match _all_docs and _view reads.
In other words, for a short while, we'd support bi-directional
replications between 4.x and 1/2/3.x on any replicator and document
that fact, then after a while will switch that capability off and
users would have to run replications on a 4.x replicator only, or
specially updated 3.x replicators.

> I'd rather support this scenario than have to support explaining why the "one 
> shot" replication back to an old 1.x, when initiated by a 1.x cluster, is 
> returning results "ahead" of the time at which the one-shot replication was 
> started.

Ah, that won't happen in the current fdb prototype branch
implementation. What might happen is there would be changes present in
the changes feed that happened _after_ the request has started. That
won't be any different than if a node where replication runs restarts,
or there is a network glitch. The changes feed would proceed from the
last checkpoint and see changes that happened after the initial
starting sequence and apply them in order (document "a" was deleted,
then it was updated again then deleted again, every change will be
applied incrementally to the target, etc).

We'd have to document the fact that a single snapshot replication from
4.x -> 1/2/3.x is impossible anyway (unless we do the trick where we
compare the update sequence and db was not updated in the meantime or
the new FDB storage engine allows it).  The question then becomes if
we allow the pagination to happen on the client or the server. In case
of normal replication I think it would be nice to allow it to happen
on the server for a bit to allow for maximum initial replication
interoperability.

> For cases where you’re not concerned about the snapshot isolation (e.g. 
> streaming an entire _changes feed), there is a small performance benefit to 
> requesting a new FDB transaction asynchronously before the old one actually 
> times out and swapping over to it. That’s a pattern I’ve seen in other FDB 
> layers but I’m not sure we’ve used it anywhere in CouchDB yet.

Good point, Adam. We could optimize that part, yeah. Fetch a GRV after
4.9 seconds or so and keep it ready to go for example. So far we tried
to react to the transaction_too_old exception, as opposed to starting
a timer there in order to allow us to use the maximum time a tx is
alive, to save a few seconds or milliseconds. That required some
tricks such as handling the exception bubbling up from either the
range read itself, or from the user's callback (say if user code in
the callback fetched a doc body which blew up with a
transaction_too_old exception). As an interesting aside, from quick
experiments I had noticed we were able to stream about 100-150k rows
from a single tx snapshot, that wasn't too bad I thought.

Speaking of replication, I am trying to see what the replicator might
look like in 4.x in the https://github.com/apache/couchdb/pull/3015
(prototype/fdb-replicator branch). It's very much a wip and hot mess
currently. Will issue an RFC once I have a better handle on the
general shape of it. So far it's based on couch_jobs, with a global
queue and looks like it might be smaller overall, as it's leveraging
the scheduling capabilities already present in couch_jobs, and but
once started individual replication job process hierarchy is largely
the same as before.

Cheers,
-Nick

On Wed, Jul 22, 2020 at 8:48 AM Bessenyei Balázs Donát
<[email protected]> wrote:
>
> On Tue, 21 Jul 2020 at 18:45, Jan Lehnardt <[email protected]> wrote:
> > I’m not sure why a URL parameter vs. a path makes a big difference?
> >
> > Do you have an example?
> >
> > Best
> > Jan
> > —
>
> Oh, sure! OpenAPI Generator [1] and et al. for example generate Java
> methods (like [2] out of spec [3]) per path per verb.
> Java's type safety and the way methods are currently generated don't
> really provide an easy way to retrieve multiple kinds of responses, so
> having them separate would help a lot there.
>
>
> Donat
>
> PS. I'm getting self-conscious about discussing this in this thread.
> Should I open a new one?
>
>
> [1] https://openapi-generator.tech/
> [2] 
> https://github.com/OpenAPITools/openapi-generator/blob/c49d8fd/samples/client/petstore/java/okhttp-gson/src/main/java/org/openapitools/client/api/PetApi.java#L606
> [3] 
> https://github.com/OpenAPITools/openapi-generator/blob/c49d8fd/samples/client/petstore/java/okhttp-gson/api/openapi.yaml#L208

Re: [DISCUSS] couchdb 4.0 transactional semantics

Reply via email to