Hi all, I've checked the recent discussions and apparently July is the "vision month" lol. Hopefully this email will not saturate the patience of the core team.
We have been thinking about forking/rewriting CouchDb internally for quite some time now, and this idea has reached a degree of maturity such that I'm pretty confident it will materialize at this point. We hesitated between doing our thing internally to then make our big open-sourcing announcement 5-10 years from now when the product is battle tested, and announcing our intentions here today. However, I realized that good things may happen by providing this feedback, and that providing this type of feedback also is a way of giving back to the community. The reason for this project is that we have lost confidence in the way the vision of CouchDb aligns with our goals. As far as we are concerned, there are 3 things we loved with CouchDb: #Map/Reduce We think that the benefits of Map/Reduce are very underrated. Map/reduce forces developpers to approach problems differently and results in much more efficient and well-thought of application architectures and implementations. This is in addition to the performance benefits since indexes are built in advance in a very predictable manner (with a few well-documented caveats). For this reason, our developers are forbidden from using Mango, and we require them to wrap their head around problems until they are able to solve them in map/reduce mode. However, we can see that the focus of the CouchDb project is increasingly on Mango, and we have little confidence in the commitment of the project to first-class citizen Map/Reduce support (while this was for us a defining aspect of the identity of CouchDb). #Complexity of the codebase An open-source software that is too complex to be tweaked and hacked is for all practical purposes closed-source software. You guys are VERY smart. And by nature a database software system is a non-trivial piece of technology. Initially we felt confident that the codebase was small enough and clean enough that should we really need to get our hands dirty in an emergency situation, we would be able to do so. Then Mango made the situation a bit blurrier, but we could easily ignore that, especially since we do not use it. However with FoundationDB... this becomes a whole different story. The domain model of a database is non-trivial by nature, and now FoundationDb will introduce an additional level of abstraction and indirection, and a very serious one. I've been reading the design discussions since the FoundationDb announcement and there are a lot of impedance mistmatches requiring the domain model of CouchDb to be broken up in fictious entities intended to accomodate FoundationDb abstractions and their limitations (I'll back to this point in a moment). Indirection is also introduced at the business logic level, with additional steps needing to be followed to emulate the desired behavior. All of this is complexity and obfuscation, and to be realistic, if we already struggled with the straight-to-the-point implementation, there is no way we'll be able to navigate (let alone hack), the FoundationDB-based implementation. #(Apparent) Non-Alignment of FoundationDb with the reasons that made us love CouchDb FoundationDb introduces limitations regarding transactions, document sizes and another number of critical items. One of the main reasons we use CouchDb is because of the way it allows us to develop applications rapidly and flexibly address all the state storage needs of application layers. CouchDb has you covered if you just want to dump large media file streamed with HTTP range requests while you iterate fast and your userbase is small, and replication allows you to seemless scale by distributing load on clusters in advanced ways without needing to redesign your applications. The user nkosi23 nicely describes some of the new possibilities enabled by CouchDb: https://github.com/apache/couchdb/pull/1253#issuecomment-507043600 However, the limitations introduced by FoundationDb and the spirit of their project favoring abstraction purity through aggressive constraints, over operational flexibility is the opposite of the reasons we loved CouchDb and believed in it. It is to us pretty clear that the writing is on the wall. We aren't confident in FoundationDb to cover our bases, since covering our bases is explicitly not the goal of their project and their spirit is different from what has made CouchDb unique (ease of use, simple yet powerful and flexible abstractions etc...). #Lack of commitment to the ideas pioneered We feel like Couchdb itself undervalues the wealth of what it has brought to the table. For example when it comes to architecting load balancing for all sorts of applications with a single and transparent value store, CouchDb enables things that simply weren't possible before, and people will need time to understand how they can take advantage of them. Nowadays we can see sed, awk and such be used in pretty clever ways, but it took time for people to incorporate the possibilities enabled by these tools in their thinking process (even though system administration are much easier to deploy than enterprise applications). I think that CouchDb should have a 10 or 20-year outlook on the paradigm shifts its introduces, there is a need to give more place to faith and less place to data since not every usage will be adopted within 3 years. Sometimes you need to do things because you believe in them and you know you are right and that eventually people will come. But right now, it feels like customer statistics from Cloudant have become the main driver of the project. A balanced probably can be found between aligning with business realities and evangelism realities. I feel IBM guys are totally right to share their insights, but if there are not faith-zealots to counter-balance, then a positive may become a negative. #What we plan to do For all these reasons, CouchDb 3 will likely be the last release we will use. What we are about to activate is an effort to rewrite CouchDb to focus on the use case that we think makes CouchDb unique: a one-stop shop for all data storage needs, no matter the type of application and load. This means focusing on, on the one hand on working seamlessly with extremely large attachments and documents of any size, and on the other hand replication features (which goes hand in hand). We will also seek to resurrect old features such as list views that we think need long-term faith. To make it possible from a bandwidth perspective, we will make a number of radical decisions. The two most important ones may be the following: - Only map/reduce will be supported. Far from a limitation we see this as a way of life and a different way of thinking about designing line of business applications. Our finding is that a line of business applications never needs SQL style flexibility for the main app is the problem space has been correctly modeled (instead of being Excel in the web browser). When Business Analytics are really needed, the need is always very localized, and it is nowadays easy enough to have an ETL pipeline on a separate instance (especially considering CouchDb filtered replication capabilities). - Rewrite CouchDb in FSharp. Rewriting in Fsharp will provide all the benefits of functional programming, while giving us access to a rich ecosystem of libraries, and a great static type checking system. All of this will mean more time to focus on the core features. This is in a gist pretty much the plan. This is still early stages, and the way we do things, we would typically roll it out internally for a number of years before announcing it to the public. So I think there will likely be a 10-yearish window before you hear about this again. I simply wanted to provide our feedback as a friendly contribution.