Hi Joan, no need for apologies! Snipping out a few bits:

> One alternative is to always keep just one around, and constantly update
> it every 5s, whether it's used or not (idle server).

Agreed, I see no reason to keep multiple old read versions around on a given 
CouchDB node. Updating it every second or two could be a nice thing to do (I 
wouldn’t wait 5 seconds because handing out a read version 4.95 seconds old 
isn’t very useful to anyone ;).

> This second option seems better, but as mentioned later we don't want it
> to be a transparent FDB token (or convertible into one). This parallels
> the nonce approach we use in _changes feeds to ensure a stable feed, yeah?

In our current design we _do_ expose FDB versions pretty directly as database 
update sequences (there’s a small prefix to allow for _changes to stay 
monotonically increasing when relocating a database to a new FDB cluster). I 
believe it’s worth us thinking about expanding the use of sequences to other 
places in the API as those are a concept that’s already pretty familiar to our 
users.

> If we eschew API changes for 4.0 then we need to decide on the default. And if
> we're voting, I'd say making RYWs the default (never hanging onto a
> handle) and then (ab-)using stale=ok or whatever state we have lying
> around might be sufficient.

I definitely agree. We should not be using old read versions without the 
client’s knowledge unless it's for some internal process where we know all the 
tradeoffs.

> This is the really important data point here for me. While Cloudant
> cares about 2-3 extra ms on the server side, many many MANY CouchDB
> users don't. Can we benchmark what this looks like when running
> FDB+CouchDB on a teeny platform like a RasPi? Is it still 2-3ms? What
> about the average laptop/desktop? Or is it only 2-3ms on a beefy
> Cloudant-sized server?

I don’t have hard performance numbers, but I expect that acquiring a read 
version in a small-scale deployment is faster than the same operation against a 
big FoundationDB deployment spanning zones in a cloud region. When you scale 
down e.g. to a single FDB process that process ends up playing all the roles 
that need to collaborate to decide on a read version and so the network latency 
gets taken out of the picture.

Cheers, Adam

> On Sep 25, 2019, at 11:57 AM, Joan Touzet <woh...@apache.org> wrote:
> 
> I apologize in advance. I am finding it very very difficult to allocate
> the time and energy necessary to go deep into any of these topics, and
> got lost halfway thru Mike Rhodes' email :( So I'm replying to Adam's
> initial email which is the only one I've fully digested.
> 
> On 2019-09-19 18:11, Adam Kocoloski wrote:
>> Hi all,
>> 
>> As we’ve gotten more familiar with FoundationDB we’ve come to realize that 
>> acquiring a read version at the beginning of a transaction is a relatively 
>> expensive[*] operation. It’s also a challenging one to scale given the 
>> amount of communication required between proxies and tlogs in order to agree 
>> on a good version. The prototype CouchDB layer we’ve been working on (i.e., 
>> the beginnings of CouchDB 4.0) uses a separate FDB transaction with a new 
>> read version for every request made to CouchDB. I wanted to start a 
>> discussion about ways we might augment that approach while preserving (or 
>> even enhancing) the semantics that we can expose to CouchDB users.
>> 
>> One thing we can do is cache known versions that FDB has supplied in the 
>> past second in the CouchDB layer and reuse those when a client permits us to 
>> do so. If you like, this is the modern version of `?stale=ok`, but now 
>> applicable to all types of requests. One big downside of this approach is 
>> that if you scale out the members of the CouchDB layer they’ll have 
>> different views of recent FDB versions, and a client whose requests are 
>> load-balanced across members won’t have any guarantee that time moves 
>> forward from request to request. You could imagine gossiping versions 
>> between layer members, but now you’re basically redoing the work that 
>> FoundationDB is doing itself.
> 
> Keeping extra state alive in the CouchDB runtime is something we've
> always avoided. Maybe if someone's doing keepalives, but even then, if
> that "someone" is a reverse proxy server, it could have unintended
> consequences.
> 
> One alternative is to always keep just one around, and constantly update
> it every 5s, whether it's used or not (idle server).
> 
> Read Your Writes has been one of the biggest requests for CouchDB for
> ages, and we're finally in a place to provide it. The secondary question
> on my mind is: is that the default, or is old behaviour the default, or
> is it a configurable default?
> 
>> Another approach is to communicate the FDB version as part of the response 
>> to each request, and allow the client to set an FDB version as part of a 
>> submitted request. Clients that do this will experience lower latencies for 
>> requests 2..N that share a version, will have the benefit of a consistent 
>> snapshot of the database for all the reads that are executed using the same 
>> version, and can guarantee they read their own writes when interleaving 
>> those operations (assuming any reads following a write use the new FDB 
>> version associated with the write).
> 
> This second option seems better, but as mentioned later we don't want it
> to be a transparent FDB token (or convertible into one). This parallels
> the nonce approach we use in _changes feeds to ensure a stable feed, yeah?
> 
>> These techniques are not mutually exclusive; a client could acquire a 
>> slightly stale FDB version and then use that for a collection of read 
>> requests that would all observe the same consistent snapshot of the 
>> database.  Also, recall that a CouchDB sequence is now essentially the same 
>> as an FDB version, with a little extra metadata to ensure sequences are 
>> always monotonically increasing even when moving a database to a different 
>> FDB cluster. So if you like, this is about allowing requests to be executed 
>> as of a certain sequence (provided that sequence is no more than 5 seconds 
>> old).
>> 
>> I’m refraining from proposing any specific API extensions at this point, 
>> partly because that’s an easy bikeshed and partly because I think whatever 
>> API we’d add would be a primitive that client libraries would use to 
>> construct richer semantics around. I’m also biting my tongue and avoiding 
>> any detailed discussion of the transactional capabilities that CouchDB could 
>> offer by surfacing these versions to clients — but that’s definitely an 
>> interesting topic in its own right!
> 
> Mike touches on this and I think it's worth careful consideration. If we
> eschew API changes for 4.0 then we need to decide on the default. And if
> we're voting, I'd say making RYWs the default (never hanging onto a
> handle) and then (ab-)using stale=ok or whatever state we have lying
> around might be sufficient.
> 
>> Curious to hear what you all think. Thanks, Adam
> 
> Thanks Adam, this is great.
> 
>> [*]: I don’t want to come off as alarmist; when I say this operation is 
>> “expensive” I mean it might take a couple of milliseconds depending on FDB 
>> configuration, and FDB can execute 10s of thousands of these per second 
>> without much tuning. But it’s always good to be looking for the next 
>> bottleneck :)
> 
> This is the really important data point here for me. While Cloudant
> cares about 2-3 extra ms on the server side, many many MANY CouchDB
> users don't. Can we benchmark what this looks like when running
> FDB+CouchDB on a teeny platform like a RasPi? Is it still 2-3ms? What
> about the average laptop/desktop? Or is it only 2-3ms on a beefy
> Cloudant-sized server?
> 
> -Joan
> 

Reply via email to