CouchDb Rewrite/Fork

Reddy B . Tue, 09 Jul 2019 16:07:59 -0700

Hi all,

I've checked the recent discussions and apparently July is the "vision month" 
lol. Hopefully this email will not saturate the patience of the core team.


We have been thinking about forking/rewriting CouchDb internally for quite some 
time now, and this idea has reached a degree of maturity such that I'm pretty 
confident it will materialize at this point. We hesitated between doing our 
thing internally to then make our big open-sourcing announcement 5-10 years 
from now when the product is battle tested, and announcing our intentions here 
today.

However, I realized that good things may happen by providing this feedback, and 
that providing this type of feedback also is a way of giving back to the 
community.

The reason for this project is that we have lost confidence in the way the 
vision of CouchDb aligns with our goals. As far as we are concerned, there are 
3 things we loved with CouchDb:

#Map/Reduce

We think that the benefits of Map/Reduce are very underrated. Map/reduce forces 
developpers to approach problems differently and results in much more efficient 
and well-thought of  application architectures and implementations. This is in 
addition to the performance benefits since indexes are built in advance in a 
very predictable manner (with a few well-documented caveats). For this reason, 
our developers are forbidden from using Mango, and we require them to wrap 
their head around problems until they are able to solve them in map/reduce mode.

However, we can see that the focus of the CouchDb project is increasingly on 
Mango, and we have little confidence in the commitment of the project to 
first-class citizen Map/Reduce support (while this was for us a defining aspect 
of the identity of CouchDb).

#Complexity of the codebase

An open-source software that is too complex to be tweaked and hacked is for all 
practical purposes closed-source software. You guys are VERY smart. And by 
nature a database software system is a non-trivial piece of technology.

Initially we felt confident that the codebase was small enough and clean enough 
that should we really need to get our hands dirty in an emergency situation, we 
would be able to do so. Then Mango made the situation a bit blurrier, but we 
could easily ignore that, especially since we do not use it. However with 
FoundationDB... this becomes a whole different story.

The domain model of a database is non-trivial by nature, and now FoundationDb 
will introduce an additional level of abstraction and indirection, and a very 
serious one. I've been reading the design discussions since the FoundationDb 
announcement and there are a lot of impedance mistmatches requiring the domain 
model of CouchDb to be broken up in fictious entities intended to accomodate 
FoundationDb abstractions and their limitations (I'll back to this point in a 
moment).

Indirection is also introduced at the business logic level, with additional 
steps needing to be followed to emulate the desired behavior. All of this is 
complexity and obfuscation, and to be realistic, if we already struggled with 
the straight-to-the-point implementation, there is no way we'll be able to 
navigate (let alone hack), the FoundationDB-based implementation.

#(Apparent) Non-Alignment of FoundationDb with the reasons that made us love 
CouchDb

FoundationDb introduces limitations regarding transactions, document sizes and 
another number of critical items. One of the main reasons we use CouchDb is 
because of the way it allows us to develop applications rapidly and flexibly 
address all the state storage needs of application layers. CouchDb has you 
covered if you just want to dump large media file streamed with HTTP range 
requests while you iterate fast and your userbase is small, and replication 
allows you to seemless scale by distributing load on clusters in advanced ways 
without needing to redesign your applications. The user nkosi23 nicely 
describes some of the new possibilities enabled by CouchDb:

https://github.com/apache/couchdb/pull/1253#issuecomment-507043600

However, the limitations introduced by FoundationDb and the spirit of their 
project favoring abstraction purity through aggressive constraints, over 
operational flexibility is the opposite of the reasons we loved CouchDb and 
believed in it. It is to us pretty clear that the writing is on the wall. We 
aren't confident in FoundationDb to cover our bases, since covering our bases 
is explicitly not the goal of their project and their spirit is different from 
what has made CouchDb unique (ease of use, simple yet powerful and flexible 
abstractions etc...).

#Lack of commitment to the ideas pioneered

We feel like Couchdb itself undervalues the wealth of what it has brought to 
the table. For example when it comes to architecting load balancing for all 
sorts of applications with a single and transparent value store, CouchDb 
enables things that simply weren't possible before, and people will need time 
to understand how they can take advantage of them.

Nowadays we can see sed, awk and such be used in pretty clever ways, but it 
took time for people to incorporate the possibilities enabled by these tools in 
their thinking process (even though system administration are much easier to 
deploy than enterprise applications).

I think that CouchDb should have a 10 or 20-year outlook on the paradigm shifts 
its introduces, there is a need to give more place to faith and less place to 
data since not every usage will be adopted within 3 years. Sometimes you need 
to do things because you believe in them and you know you are right and that 
eventually people will come. But right now, it feels like customer statistics 
from Cloudant have become the main driver of the project. A balanced probably 
can be found between aligning with business realities and evangelism realities. 
I feel IBM guys are totally right to share their insights, but if there are not 
faith-zealots to counter-balance, then a positive may become a negative.

#What we plan to do

For all these reasons, CouchDb 3 will likely be the last release we will use. 
What we are about to activate is an effort to rewrite CouchDb to focus on the 
use case that we think makes CouchDb unique: a one-stop shop for all data 
storage needs, no matter the type of application and load. This means focusing 
on, on the one hand on working seamlessly with extremely large attachments and 
documents of any size, and on the other hand replication features (which goes 
hand in hand).

We will also seek to resurrect old features such as list views that we think 
need long-term faith. To make it possible from a bandwidth perspective, we will 
make a number of radical decisions. The two most important ones may be the 
following:

- Only map/reduce will be supported. Far from a limitation we see this as a way 
of life and a different way of thinking about designing line of business 
applications. Our finding is that a line of business applications never needs 
SQL style flexibility for the main app is the problem space has been correctly 
modeled (instead of being Excel in the web browser). When Business Analytics 
are really needed, the need is always very localized, and it is nowadays easy 
enough to have an ETL pipeline on a separate instance (especially considering 
CouchDb filtered replication capabilities).
- Rewrite CouchDb in FSharp.

Rewriting in Fsharp will provide all the benefits of functional programming, 
while giving us access to a rich ecosystem of libraries, and a great static 
type checking system. All of this will mean more time to focus on the core 
features.

This is in a gist pretty much the plan. This is still early stages, and the way 
we do things, we would typically roll it out internally for a number of years 
before announcing it to the public. So I think there will likely be a 
10-yearish window before you hear about this again.

I simply wanted to provide our feedback as a friendly contribution.

CouchDb Rewrite/Fork

Reply via email to