Hi all,

As we’ve gotten more familiar with FoundationDB we’ve come to realize that 
acquiring a read version at the beginning of a transaction is a relatively 
expensive[*] operation. It’s also a challenging one to scale given the amount 
of communication required between proxies and tlogs in order to agree on a good 
version. The prototype CouchDB layer we’ve been working on (i.e., the 
beginnings of CouchDB 4.0) uses a separate FDB transaction with a new read 
version for every request made to CouchDB. I wanted to start a discussion about 
ways we might augment that approach while preserving (or even enhancing) the 
semantics that we can expose to CouchDB users.

One thing we can do is cache known versions that FDB has supplied in the past 
second in the CouchDB layer and reuse those when a client permits us to do so. 
If you like, this is the modern version of `?stale=ok`, but now applicable to 
all types of requests. One big downside of this approach is that if you scale 
out the members of the CouchDB layer they’ll have different views of recent FDB 
versions, and a client whose requests are load-balanced across members won’t 
have any guarantee that time moves forward from request to request. You could 
imagine gossiping versions between layer members, but now you’re basically 
redoing the work that FoundationDB is doing itself.

Another approach is to communicate the FDB version as part of the response to 
each request, and allow the client to set an FDB version as part of a submitted 
request. Clients that do this will experience lower latencies for requests 2..N 
that share a version, will have the benefit of a consistent snapshot of the 
database for all the reads that are executed using the same version, and can 
guarantee they read their own writes when interleaving those operations 
(assuming any reads following a write use the new FDB version associated with 
the write).

These techniques are not mutually exclusive; a client could acquire a slightly 
stale FDB version and then use that for a collection of read requests that 
would all observe the same consistent snapshot of the database.  Also, recall 
that a CouchDB sequence is now essentially the same as an FDB version, with a 
little extra metadata to ensure sequences are always monotonically increasing 
even when moving a database to a different FDB cluster. So if you like, this is 
about allowing requests to be executed as of a certain sequence (provided that 
sequence is no more than 5 seconds old).

I’m refraining from proposing any specific API extensions at this point, partly 
because that’s an easy bikeshed and partly because I think whatever API we’d 
add would be a primitive that client libraries would use to construct richer 
semantics around. I’m also biting my tongue and avoiding any detailed 
discussion of the transactional capabilities that CouchDB could offer by 
surfacing these versions to clients — but that’s definitely an interesting 
topic in its own right!

Curious to hear what you all think. Thanks, Adam

[*]: I don’t want to come off as alarmist; when I say this operation is 
“expensive” I mean it might take a couple of milliseconds depending on FDB 
configuration, and FDB can execute 10s of thousands of these per second without 
much tuning. But it’s always good to be looking for the next bottleneck :)

Reply via email to