Re: [DISCUSS] Rebase CouchDB on top of FoundationDB

Jan Lehnardt Wed, 23 Jan 2019 05:34:18 -0800

Hi Bob,

this is all very exciting!

First up, full disclosure, the CouchDB PMC has had about two weeks to think 
about this already, so if any of the following doesn’t sound like a knee-jerk 
reaction, that’s why.

I’m personally tentatively optimistic about this proposal and I’m willing to 
work through all open questions from governance, contribution management to the 
technical bits to see if we as the CouchDB project arrive at a point where we 
are comfortable going down this path.

The PMC has already identified a set of discussion areas for this dev@ mailing 
list to go through before any definite decision can be made. Separate emails 
for those discussions are going to be posted on this list shortly, so I won’t 
go into further detail here.

If anyone sees a need for discussion beyond the threads that will appear here, 
please speak up at your earliest convenience. This proposal would mean a big 
step for our project, and we must make sure to hear all voices.

Once we’ve gone through all this, the resulting answers to all the open 
questions coming up will end up in a consensus finding process on this mailing 
list, which will signify the final project decision.

* * *

That said, I’d like to highlight one of these topics: IBM/Cloudant’s 
contributions going forward.

Looking at how 2.0 came to be, the contributions were mostly taken on good 
faith (and legal review), and from the trust Cloudant built up operating a 
large number of large instances of clusters of what would eventually become 
CouchDB 2.0. It has clearly paid off for CouchDB and our current level of 
success wouldn’t be without IBM/Cloudant.

However, some of the ways we work with the IBM team leave things to be desired. 
Specifically, the Apache CouchDB community is frequently not involved in design 
discussions around new features. Those happen inside IBM and we “only” get a PR 
that then goes through the regular review process. Again, this has served us 
well, but we can do even better, so I’d like to take the opportunity of this 
larger proposal to suggest we actually do better. As promised, a more detailed 
thread about this is going to come up, and it’ll be the right place to go 
through the minutiae of this.

With this structural change, I believe we are in a great position to work 
through the details of this proposal and the subsequent design and engineering 
steps.

* * *

Finally, I want to reiterate Bob’s point: while this proposal is largely driven 
by IBM, IBM has no power to unilaterally force the CouchDB project to accept 
this proposal and they have already signalled and worked towards making this a 
mutually beneficial endeavour. The CouchDB project has different objectives 
from IBM and it is up to us to come up with a proposal that satisfies all of 
our objectives as well as IBMs, should this motion pass.

Best
Jan
—

> On 23. Jan 2019, at 11:00, Robert Samuel Newson <[email protected]> wrote:
> 
> Hi,
> 
> CouchDB 2.0 introduced clustering; the ability to scale a single database 
> across multiple nodes, increasing both the maximum size of a database and 
> adding native fault-tolerance. This welcome and considerable step forward was 
> not without its trade-offs. In the years since 2.0 was released, users 
> frequently encounter the following issues as a direct consequence of the 2.0 
> clustering approach:
> 
> 1. Conflict revisions can be created on normal concurrent updates issued to a 
> single database, since each replica of a database shard independently chooses 
> whether to accept a given update, and all replicas will eventually propagate 
> updates that any one of them has chosen to accept.
> 2. Secondary indexes ("views") do not scale the same way as document lookups, 
> as they are sharded by doc id, not emitted view key (thus forcing a 
> consultation of all shard ranges for each query).
> 3. The changes feed is no longer totally ordered and, worse, could replay 
> earlier changes in the event of a node failure (even a temporary one).
> 
> The idea is to use FoundationDB as the new CouchDB foundational layer, 
> letting it take care of data storage and placement. An introduction to 
> FoundationDB would take up too much space here so I will summarise it as a 
> highly scalable ordered key-value store with transactional semantics, 
> provides strong consistency, scaling from a single node to many. It is 
> licensed under the ASLv2 but is not an Apache project.
> 
> By using FoundationDB we can solve all three of the problems listed above and 
> deliver semantics much closer to CouchDB 1.x's behaviour while improving upon 
> the scalability advantages that 2.0 introduced. The essential character of 
> CouchDB would be preserved (MVCC for documents, replication between CouchDB 
> databases) but the underlying plumbing would change significantly. In 
> addition, this new foundation will allow us to add long wished-for features 
> more easily. For example, multi-document transactions become possible, as 
> does efficient field-level reading and writing. A further thought is the 
> ability to update views transactionally with the database update.
> 
> For those familiar with the CouchDB 2.0 architecture, the proposal is, in 
> effect, to change all the functions in fabric.erl so that they work against a 
> (possibly remote) FoundationDB cluster instead of the current implementation 
> of calling into the original CouchDB 1.x code (couch_btree, couch_file, etc).
> 
> This is a large change and, for full disclosure, the IBM Cloudant team are 
> proposing it. We have done our due diligence in investigating FoundationDB as 
> well as detailed investigation into how CouchDB semantics would be built on 
> top of FoundationDB. Any and all decisions on that must take place here on 
> the CouchDB developer mailing list, of course, but we are confident that this 
> is feasible.
> During those investigations we have identified a small number of CouchDB 
> features that we do not yet see a way to do on FoundationDB, the main one 
> being custom (Javascript) reduces. This is a direct consequence of no longer 
> rolling our own persistence layer (couch_btree and friends) and would likely 
> apply to any alternative technology. 
> 
> I think this would be a great advance for CouchDB, preserving what makes 
> CouchDB special but taking advantage of the superbly engineered FoundationDB 
> software at the bottom of the stack.
> 
> Regards,
> Robert Newson

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Re: [DISCUSS] Rebase CouchDB on top of FoundationDB

Reply via email to