Hi Joan, no need for apologies! Snipping out a few bits: > One alternative is to always keep just one around, and constantly update > it every 5s, whether it's used or not (idle server).
Agreed, I see no reason to keep multiple old read versions around on a given CouchDB node. Updating it every second or two could be a nice thing to do (I wouldn’t wait 5 seconds because handing out a read version 4.95 seconds old isn’t very useful to anyone ;). > This second option seems better, but as mentioned later we don't want it > to be a transparent FDB token (or convertible into one). This parallels > the nonce approach we use in _changes feeds to ensure a stable feed, yeah? In our current design we _do_ expose FDB versions pretty directly as database update sequences (there’s a small prefix to allow for _changes to stay monotonically increasing when relocating a database to a new FDB cluster). I believe it’s worth us thinking about expanding the use of sequences to other places in the API as those are a concept that’s already pretty familiar to our users. > If we eschew API changes for 4.0 then we need to decide on the default. And if > we're voting, I'd say making RYWs the default (never hanging onto a > handle) and then (ab-)using stale=ok or whatever state we have lying > around might be sufficient. I definitely agree. We should not be using old read versions without the client’s knowledge unless it's for some internal process where we know all the tradeoffs. > This is the really important data point here for me. While Cloudant > cares about 2-3 extra ms on the server side, many many MANY CouchDB > users don't. Can we benchmark what this looks like when running > FDB+CouchDB on a teeny platform like a RasPi? Is it still 2-3ms? What > about the average laptop/desktop? Or is it only 2-3ms on a beefy > Cloudant-sized server? I don’t have hard performance numbers, but I expect that acquiring a read version in a small-scale deployment is faster than the same operation against a big FoundationDB deployment spanning zones in a cloud region. When you scale down e.g. to a single FDB process that process ends up playing all the roles that need to collaborate to decide on a read version and so the network latency gets taken out of the picture. Cheers, Adam > On Sep 25, 2019, at 11:57 AM, Joan Touzet <woh...@apache.org> wrote: > > I apologize in advance. I am finding it very very difficult to allocate > the time and energy necessary to go deep into any of these topics, and > got lost halfway thru Mike Rhodes' email :( So I'm replying to Adam's > initial email which is the only one I've fully digested. > > On 2019-09-19 18:11, Adam Kocoloski wrote: >> Hi all, >> >> As we’ve gotten more familiar with FoundationDB we’ve come to realize that >> acquiring a read version at the beginning of a transaction is a relatively >> expensive[*] operation. It’s also a challenging one to scale given the >> amount of communication required between proxies and tlogs in order to agree >> on a good version. The prototype CouchDB layer we’ve been working on (i.e., >> the beginnings of CouchDB 4.0) uses a separate FDB transaction with a new >> read version for every request made to CouchDB. I wanted to start a >> discussion about ways we might augment that approach while preserving (or >> even enhancing) the semantics that we can expose to CouchDB users. >> >> One thing we can do is cache known versions that FDB has supplied in the >> past second in the CouchDB layer and reuse those when a client permits us to >> do so. If you like, this is the modern version of `?stale=ok`, but now >> applicable to all types of requests. One big downside of this approach is >> that if you scale out the members of the CouchDB layer they’ll have >> different views of recent FDB versions, and a client whose requests are >> load-balanced across members won’t have any guarantee that time moves >> forward from request to request. You could imagine gossiping versions >> between layer members, but now you’re basically redoing the work that >> FoundationDB is doing itself. > > Keeping extra state alive in the CouchDB runtime is something we've > always avoided. Maybe if someone's doing keepalives, but even then, if > that "someone" is a reverse proxy server, it could have unintended > consequences. > > One alternative is to always keep just one around, and constantly update > it every 5s, whether it's used or not (idle server). > > Read Your Writes has been one of the biggest requests for CouchDB for > ages, and we're finally in a place to provide it. The secondary question > on my mind is: is that the default, or is old behaviour the default, or > is it a configurable default? > >> Another approach is to communicate the FDB version as part of the response >> to each request, and allow the client to set an FDB version as part of a >> submitted request. Clients that do this will experience lower latencies for >> requests 2..N that share a version, will have the benefit of a consistent >> snapshot of the database for all the reads that are executed using the same >> version, and can guarantee they read their own writes when interleaving >> those operations (assuming any reads following a write use the new FDB >> version associated with the write). > > This second option seems better, but as mentioned later we don't want it > to be a transparent FDB token (or convertible into one). This parallels > the nonce approach we use in _changes feeds to ensure a stable feed, yeah? > >> These techniques are not mutually exclusive; a client could acquire a >> slightly stale FDB version and then use that for a collection of read >> requests that would all observe the same consistent snapshot of the >> database. Also, recall that a CouchDB sequence is now essentially the same >> as an FDB version, with a little extra metadata to ensure sequences are >> always monotonically increasing even when moving a database to a different >> FDB cluster. So if you like, this is about allowing requests to be executed >> as of a certain sequence (provided that sequence is no more than 5 seconds >> old). >> >> I’m refraining from proposing any specific API extensions at this point, >> partly because that’s an easy bikeshed and partly because I think whatever >> API we’d add would be a primitive that client libraries would use to >> construct richer semantics around. I’m also biting my tongue and avoiding >> any detailed discussion of the transactional capabilities that CouchDB could >> offer by surfacing these versions to clients — but that’s definitely an >> interesting topic in its own right! > > Mike touches on this and I think it's worth careful consideration. If we > eschew API changes for 4.0 then we need to decide on the default. And if > we're voting, I'd say making RYWs the default (never hanging onto a > handle) and then (ab-)using stale=ok or whatever state we have lying > around might be sufficient. > >> Curious to hear what you all think. Thanks, Adam > > Thanks Adam, this is great. > >> [*]: I don’t want to come off as alarmist; when I say this operation is >> “expensive” I mean it might take a couple of milliseconds depending on FDB >> configuration, and FDB can execute 10s of thousands of these per second >> without much tuning. But it’s always good to be looking for the next >> bottleneck :) > > This is the really important data point here for me. While Cloudant > cares about 2-3 extra ms on the server side, many many MANY CouchDB > users don't. Can we benchmark what this looks like when running > FDB+CouchDB on a teeny platform like a RasPi? Is it still 2-3ms? What > about the average laptop/desktop? Or is it only 2-3ms on a beefy > Cloudant-sized server? > > -Joan >