Hi all, As we’ve gotten more familiar with FoundationDB we’ve come to realize that acquiring a read version at the beginning of a transaction is a relatively expensive[*] operation. It’s also a challenging one to scale given the amount of communication required between proxies and tlogs in order to agree on a good version. The prototype CouchDB layer we’ve been working on (i.e., the beginnings of CouchDB 4.0) uses a separate FDB transaction with a new read version for every request made to CouchDB. I wanted to start a discussion about ways we might augment that approach while preserving (or even enhancing) the semantics that we can expose to CouchDB users.
One thing we can do is cache known versions that FDB has supplied in the past second in the CouchDB layer and reuse those when a client permits us to do so. If you like, this is the modern version of `?stale=ok`, but now applicable to all types of requests. One big downside of this approach is that if you scale out the members of the CouchDB layer they’ll have different views of recent FDB versions, and a client whose requests are load-balanced across members won’t have any guarantee that time moves forward from request to request. You could imagine gossiping versions between layer members, but now you’re basically redoing the work that FoundationDB is doing itself. Another approach is to communicate the FDB version as part of the response to each request, and allow the client to set an FDB version as part of a submitted request. Clients that do this will experience lower latencies for requests 2..N that share a version, will have the benefit of a consistent snapshot of the database for all the reads that are executed using the same version, and can guarantee they read their own writes when interleaving those operations (assuming any reads following a write use the new FDB version associated with the write). These techniques are not mutually exclusive; a client could acquire a slightly stale FDB version and then use that for a collection of read requests that would all observe the same consistent snapshot of the database. Also, recall that a CouchDB sequence is now essentially the same as an FDB version, with a little extra metadata to ensure sequences are always monotonically increasing even when moving a database to a different FDB cluster. So if you like, this is about allowing requests to be executed as of a certain sequence (provided that sequence is no more than 5 seconds old). I’m refraining from proposing any specific API extensions at this point, partly because that’s an easy bikeshed and partly because I think whatever API we’d add would be a primitive that client libraries would use to construct richer semantics around. I’m also biting my tongue and avoiding any detailed discussion of the transactional capabilities that CouchDB could offer by surfacing these versions to clients — but that’s definitely an interesting topic in its own right! Curious to hear what you all think. Thanks, Adam [*]: I don’t want to come off as alarmist; when I say this operation is “expensive” I mean it might take a couple of milliseconds depending on FDB configuration, and FDB can execute 10s of thousands of these per second without much tuning. But it’s always good to be looking for the next bottleneck :)