Great discussion everyone! For normal replications, I think it might be nice to make an exception and allow server-side pagination for compatibility at first, with a new option to explicitly enable strict snapshots behavior. Then, in a later release make it the default to match _all_docs and _view reads. In other words, for a short while, we'd support bi-directional replications between 4.x and 1/2/3.x on any replicator and document that fact, then after a while will switch that capability off and users would have to run replications on a 4.x replicator only, or specially updated 3.x replicators.
> I'd rather support this scenario than have to support explaining why the "one > shot" replication back to an old 1.x, when initiated by a 1.x cluster, is > returning results "ahead" of the time at which the one-shot replication was > started. Ah, that won't happen in the current fdb prototype branch implementation. What might happen is there would be changes present in the changes feed that happened _after_ the request has started. That won't be any different than if a node where replication runs restarts, or there is a network glitch. The changes feed would proceed from the last checkpoint and see changes that happened after the initial starting sequence and apply them in order (document "a" was deleted, then it was updated again then deleted again, every change will be applied incrementally to the target, etc). We'd have to document the fact that a single snapshot replication from 4.x -> 1/2/3.x is impossible anyway (unless we do the trick where we compare the update sequence and db was not updated in the meantime or the new FDB storage engine allows it). The question then becomes if we allow the pagination to happen on the client or the server. In case of normal replication I think it would be nice to allow it to happen on the server for a bit to allow for maximum initial replication interoperability. > For cases where you’re not concerned about the snapshot isolation (e.g. > streaming an entire _changes feed), there is a small performance benefit to > requesting a new FDB transaction asynchronously before the old one actually > times out and swapping over to it. That’s a pattern I’ve seen in other FDB > layers but I’m not sure we’ve used it anywhere in CouchDB yet. Good point, Adam. We could optimize that part, yeah. Fetch a GRV after 4.9 seconds or so and keep it ready to go for example. So far we tried to react to the transaction_too_old exception, as opposed to starting a timer there in order to allow us to use the maximum time a tx is alive, to save a few seconds or milliseconds. That required some tricks such as handling the exception bubbling up from either the range read itself, or from the user's callback (say if user code in the callback fetched a doc body which blew up with a transaction_too_old exception). As an interesting aside, from quick experiments I had noticed we were able to stream about 100-150k rows from a single tx snapshot, that wasn't too bad I thought. Speaking of replication, I am trying to see what the replicator might look like in 4.x in the https://github.com/apache/couchdb/pull/3015 (prototype/fdb-replicator branch). It's very much a wip and hot mess currently. Will issue an RFC once I have a better handle on the general shape of it. So far it's based on couch_jobs, with a global queue and looks like it might be smaller overall, as it's leveraging the scheduling capabilities already present in couch_jobs, and but once started individual replication job process hierarchy is largely the same as before. Cheers, -Nick On Wed, Jul 22, 2020 at 8:48 AM Bessenyei Balázs Donát <bes...@apache.org> wrote: > > On Tue, 21 Jul 2020 at 18:45, Jan Lehnardt <j...@apache.org> wrote: > > I’m not sure why a URL parameter vs. a path makes a big difference? > > > > Do you have an example? > > > > Best > > Jan > > — > > Oh, sure! OpenAPI Generator [1] and et al. for example generate Java > methods (like [2] out of spec [3]) per path per verb. > Java's type safety and the way methods are currently generated don't > really provide an easy way to retrieve multiple kinds of responses, so > having them separate would help a lot there. > > > Donat > > PS. I'm getting self-conscious about discussing this in this thread. > Should I open a new one? > > > [1] https://openapi-generator.tech/ > [2] > https://github.com/OpenAPITools/openapi-generator/blob/c49d8fd/samples/client/petstore/java/okhttp-gson/src/main/java/org/openapitools/client/api/PetApi.java#L606 > [3] > https://github.com/OpenAPITools/openapi-generator/blob/c49d8fd/samples/client/petstore/java/okhttp-gson/api/openapi.yaml#L208