On 2019-09-26 13:14, Adam Kocoloski wrote: > Hi Joan, no need for apologies! Snipping out a few bits: > >> One alternative is to always keep just one around, and constantly update >> it every 5s, whether it's used or not (idle server). > > Agreed, I see no reason to keep multiple old read versions around on a given > CouchDB node. Updating it every second or two could be a nice thing to do (I > wouldn’t wait 5 seconds because handing out a read version 4.95 seconds old > isn’t very useful to anyone ;). > >> This second option seems better, but as mentioned later we don't want it >> to be a transparent FDB token (or convertible into one). This parallels >> the nonce approach we use in _changes feeds to ensure a stable feed, yeah? > > In our current design we _do_ expose FDB versions pretty directly as database > update sequences (there’s a small prefix to allow for _changes to stay > monotonically increasing when relocating a database to a new FDB cluster). I > believe it’s worth us thinking about expanding the use of sequences to other > places in the API as those are a concept that’s already pretty familiar to > our users
Did users ever craft their own 2.x db update sequence tokens to abuse the system? Probably not, because our clustering code was hard to understand. Did users ever craft their own 1.x db update sequence values? Yes, and it caused lots of problems. Does this prevent implementing the CouchDB API on any other backend? In which case, I'd be -1.... In other words, at the very least we need to reinforce that the token is opaque and that manipulating it can produce both undefined errors as well as potentially lead to (perceived?) data loss. >> If we eschew API changes for 4.0 then we need to decide on the default. And >> if >> we're voting, I'd say making RYWs the default (never hanging onto a >> handle) and then (ab-)using stale=ok or whatever state we have lying >> around might be sufficient. > > I definitely agree. We should not be using old read versions without the > client’s knowledge unless it's for some internal process where we know all > the tradeoffs. > >> This is the really important data point here for me. While Cloudant >> cares about 2-3 extra ms on the server side, many many MANY CouchDB >> users don't. Can we benchmark what this looks like when running >> FDB+CouchDB on a teeny platform like a RasPi? Is it still 2-3ms? What >> about the average laptop/desktop? Or is it only 2-3ms on a beefy >> Cloudant-sized server? > > I don’t have hard performance numbers, but I expect that acquiring a read > version in a small-scale deployment is faster than the same operation against > a big FoundationDB deployment spanning zones in a cloud region. When you > scale down e.g. to a single FDB process that process ends up playing all the > roles that need to collaborate to decide on a read version and so the network > latency gets taken out of the picture. Then I'm concerned this is premature optimization. > Cheers, Adam > >> On Sep 25, 2019, at 11:57 AM, Joan Touzet <woh...@apache.org> wrote: >> >> I apologize in advance. I am finding it very very difficult to allocate >> the time and energy necessary to go deep into any of these topics, and >> got lost halfway thru Mike Rhodes' email :( So I'm replying to Adam's >> initial email which is the only one I've fully digested. >> >> On 2019-09-19 18:11, Adam Kocoloski wrote: >>> Hi all, >>> >>> As we’ve gotten more familiar with FoundationDB we’ve come to realize that >>> acquiring a read version at the beginning of a transaction is a relatively >>> expensive[*] operation. It’s also a challenging one to scale given the >>> amount of communication required between proxies and tlogs in order to >>> agree on a good version. The prototype CouchDB layer we’ve been working on >>> (i.e., the beginnings of CouchDB 4.0) uses a separate FDB transaction with >>> a new read version for every request made to CouchDB. I wanted to start a >>> discussion about ways we might augment that approach while preserving (or >>> even enhancing) the semantics that we can expose to CouchDB users. >>> >>> One thing we can do is cache known versions that FDB has supplied in the >>> past second in the CouchDB layer and reuse those when a client permits us >>> to do so. If you like, this is the modern version of `?stale=ok`, but now >>> applicable to all types of requests. One big downside of this approach is >>> that if you scale out the members of the CouchDB layer they’ll have >>> different views of recent FDB versions, and a client whose requests are >>> load-balanced across members won’t have any guarantee that time moves >>> forward from request to request. You could imagine gossiping versions >>> between layer members, but now you’re basically redoing the work that >>> FoundationDB is doing itself. >> >> Keeping extra state alive in the CouchDB runtime is something we've >> always avoided. Maybe if someone's doing keepalives, but even then, if >> that "someone" is a reverse proxy server, it could have unintended >> consequences. >> >> One alternative is to always keep just one around, and constantly update >> it every 5s, whether it's used or not (idle server). >> >> Read Your Writes has been one of the biggest requests for CouchDB for >> ages, and we're finally in a place to provide it. The secondary question >> on my mind is: is that the default, or is old behaviour the default, or >> is it a configurable default? >> >>> Another approach is to communicate the FDB version as part of the response >>> to each request, and allow the client to set an FDB version as part of a >>> submitted request. Clients that do this will experience lower latencies for >>> requests 2..N that share a version, will have the benefit of a consistent >>> snapshot of the database for all the reads that are executed using the same >>> version, and can guarantee they read their own writes when interleaving >>> those operations (assuming any reads following a write use the new FDB >>> version associated with the write). >> >> This second option seems better, but as mentioned later we don't want it >> to be a transparent FDB token (or convertible into one). This parallels >> the nonce approach we use in _changes feeds to ensure a stable feed, yeah? >> >>> These techniques are not mutually exclusive; a client could acquire a >>> slightly stale FDB version and then use that for a collection of read >>> requests that would all observe the same consistent snapshot of the >>> database. Also, recall that a CouchDB sequence is now essentially the same >>> as an FDB version, with a little extra metadata to ensure sequences are >>> always monotonically increasing even when moving a database to a different >>> FDB cluster. So if you like, this is about allowing requests to be executed >>> as of a certain sequence (provided that sequence is no more than 5 seconds >>> old). >>> >>> I’m refraining from proposing any specific API extensions at this point, >>> partly because that’s an easy bikeshed and partly because I think whatever >>> API we’d add would be a primitive that client libraries would use to >>> construct richer semantics around. I’m also biting my tongue and avoiding >>> any detailed discussion of the transactional capabilities that CouchDB >>> could offer by surfacing these versions to clients — but that’s definitely >>> an interesting topic in its own right! >> >> Mike touches on this and I think it's worth careful consideration. If we >> eschew API changes for 4.0 then we need to decide on the default. And if >> we're voting, I'd say making RYWs the default (never hanging onto a >> handle) and then (ab-)using stale=ok or whatever state we have lying >> around might be sufficient. >> >>> Curious to hear what you all think. Thanks, Adam >> >> Thanks Adam, this is great. >> >>> [*]: I don’t want to come off as alarmist; when I say this operation is >>> “expensive” I mean it might take a couple of milliseconds depending on FDB >>> configuration, and FDB can execute 10s of thousands of these per second >>> without much tuning. But it’s always good to be looking for the next >>> bottleneck :) >> >> This is the really important data point here for me. While Cloudant >> cares about 2-3 extra ms on the server side, many many MANY CouchDB >> users don't. Can we benchmark what this looks like when running >> FDB+CouchDB on a teeny platform like a RasPi? Is it still 2-3ms? What >> about the average laptop/desktop? Or is it only 2-3ms on a beefy >> Cloudant-sized server? >> >> -Joan >> >