I'll have to admit that I haven't spent a lot of time reading up on CRDTs as they seemed a bit more niche than what we'd want for a general purpose document database. Although niche is perhaps the wrong word. More, complicated and hidden behind libraries, i.e., conflict free counters. Although there was a paper [1] out recently talking about CRDT's for JSON. Maybe there's something to that. I'm not against anything of this nature as long as we don't end up with something that's more complicated but less generally useful.
For the HTTP API's, I'm imagining some sort of internal API thing at the fabric level that would then be easily re-used by many different APIs which means maintaing a backwards compatibility layer would be as easy as writing a new WebSocket interface. Thus anyone needing to maintain old behavior on newer versions could just plug in the "legacy interface" or what have you. I agree that managing APIs on ports and so on is outside of CouchDB's purview. Everyone already has to do http vs https, and then websockets would be another thing. But this really gets into the load balancing question which is different for each user. Obviously we'll provide some guidance here but I don't think we should get too far into officiating how that works given that there are so many different possibilities. On Elixir I mostly agree. I haven't used it personally but it does look like it could be a productivity enhancement. While Erlang's syntax is weird (and that's not an execuse), Erlang unquestionably has a very dense standard library which I find to be the hardest part of learning "Erlang". Heck, I've been writing Erlang full time for five years and I still have to reference the lists module documentation almost daily because argument ordering is so willy nilly. And then beyond that there's the depth of knowledge to use OTP correctly. From what I've seen Elixir has a more consistent design and fixes a number of things of that nature. Whether or not that gets us more devs or not I couldn't say for sure but I'd be more than willing to give it a try and hope that it just becomes obvious one way or the other whether its worth it. [1] http://arxiv.org/pdf/1608.03960v1.pdf On Wed, Sep 28, 2016 at 1:17 PM, Robert Samuel Newson <[email protected]> wrote: > Wow, long posts. :) > > This is the kind of thing I have been waiting to see for a while now, > updating and expanding some of the fundamentals of couchdb. > > My personal interest for 3.0 is to address the known issues that arise from > our approach to multi-master replication, the MVCC and ever-widening revision > trees. We can't continue to have a story that replication "just works" if it > also a) degrades super-linearly under reasonable conditions and b) requires > manual remediation (which can be exacerbated by a). > > I like Paul's idea on graphs, it would be a neat solution to have a revision > with multiple parents, a genuine conflict resolution that fans the structure > back in. But can we do better? The rise of CRDT's has happened between the > 1.6.1 and 2.0 release. Could we build in these kinds of structures or > something similar such that conflicts are resolved in a semantically > meaningful way at every step (avoiding wide revision graphs entirely)? > > Beyond that concern, I agree with Jan on the release cadence, one major > feature per release (plus sundry fixes) and on focusing on a cowboy-based > http layer. Now that 2.0 is out we can drop the couch_httpd_* modules > (preserved by cloudant mostly to avoid an even larger diff with couch). I > would like to take the opportunity to 'fix' the API (in terms of consistently > adhering to REST semantics). To avoid a python 2/3 eternal non-migration, > we'll need to at least inter-replicate. For Cloudant, we obviously couldn't > drop the "old" (i.e, current) API for some considerable period of time, if > ever, so some discussion on how we might have a version selection mechanism > would be good. We might agree that it's not couchdb's concern at all. > > There were many other ideas raised here (and so good to see Adam contributing > his thoughts again), they are all excellent and I'd love to see them all > land. With one exception, Elixir. I think it's a whole separate thread to > discuss it so I'll just summarise* my current position. I think it would be a > mistake to mix two languages within, say, fabric, or couch_db_* modules. I'm > not even sure it's wise or healthy to have some applications within the > couchdb release tarball that are wholly elixir when all the current ones are > wholly erlang. I obviously accept that erlang syntax is a barrier to many, > and while I feel that's unreasonable (all syntax is weird) it does seem to be > an accepted truth. So, for me, there's a discussion on how we would handle > the mixed case. Perhaps there's even a discussion on whether we should, at > least in principle, plan on fully migrating to elixir (over time / releases). > The reason to do this is predicated on the assertion that there's a larger > (compared to erlang) crowd of developers that would work on or with couchdb > if we did so. That's obviously speculative, but it's the question to answer. > If we would gain no more visibility, no more developer / contributor activity > within the project or surrounding it, then it would be a spectacular waste of > time. I daresay we might lose existing contributors in the push. I'll say, > for my part, that I would learn and develop in Elixir if that was the > consensus of the CouchDB community. > > Ta, > B. > > * at length. > > >> On 28 Sep 2016, at 18:04, Paul Davis <[email protected]> wrote: >> >> Thanks for the write up, Jan! I've only got one major change to add >> and its a bit of a doozy. >> >> # Update our revision model from a tree to a graph >> >> As a bit of background, our current revision model is a standard tree. >> The biggest issue we've seen customers have with using CouchDB >> "normally" is when they have a work load that generates conflicts with >> regularity. This ends up creating revision trees with many thousands >> of revisions with no general bound on growth of the tree. Eventually a >> single document can take many seconds to minutes to update as the tree >> has to be read from disk, updated, and then written back to disk. The >> only effective solution to this currently is to have an operator >> manually purge revisions from the document. >> >> The best solution I've come across to this issue would be to change >> our revision model to be a graph. Then instead of resolving conflicts >> by deleting a revision, the conflict is resolved by making an update >> that references two or more revisions. In this way a customer that >> generates a large number of conflicts can easily resolve the situation >> during their normal conflict resolution process. (Our stemming would >> change from keeping $revs_limit revisions for each leaf to doing a >> breadth first search and keeping all revisions for the depth that >> contains $revs_limit revisions or something.) >> >> While this approach is fairly straight forward in theory, the >> difficult part is how we'd want to handle backwards compatibility with >> replication. So far I could see having a replicator that could >> translate new revision graphs to old revision trees by "undoing" merge >> changes and created the equivalent of the old "deleted revision" >> logic. However, I don't see a way that we could go from the old format >> back to the graph (ie, think about replicating a new style graph doc >> through an old CouchDB version back to a new graph doc and end up with >> the same doc). >> >> Obviously, this is a long term goal but I do think we should start >> thinking about this and possibly making transition plans long term >> (assuming everyone thinks this is a good idea). >> >> A couple other notes: >> >> For the HTTP API upgrades, we should look at this as a step in >> refactoring the logic quite a bit and working for clean interfaces >> internally. If we include that as part of the work then having >> HTTP/HTTP2/WebSockets/Whatevers interfaces available would become much >> easier. This then enables follow on features like replication over >> WebSockets or even easier integration work as Koco suggests. I also >> agree that this is probably our highest priority major feature. >> >> My second highest priority feature would then be the pluggable storage >> engine work. I believe the current is solid and is minimally invasive. >> Mayya Sharipova has also been doing some work at [1] using it to >> enable an improved purge that will hopefully allow us to provide a >> purge operation at the cluster level (as a bandaid for the revision >> tree issues I mentioned above). I'd love to get this in and start >> having other people hacking on alternative storage engine >> implementations so we can refine the APIs even further. >> >> Lastly, for the smart cluster clients, the work I did for COUCHDB-2791 [2] >> already implements a bit of that. There's definitely more to add here >> to flesh it out but the surprisingly simple implementation makes me >> think that it'd fit in quite nicely with the HTTP refactoring work >> from above. I'm currently working on improving our API tests in Erlang >> so that I can eventually turn those branches into PRs. >> >> [1] https://github.com/cloudant/couchdb-couch/commits/68275_cluster_purge >> [2] https://issues.apache.org/jira/browse/COUCHDB-2791 >> >> On Tue, Sep 27, 2016 at 7:56 AM, Jan Lehnardt <[email protected]> wrote: >>> Hi all, >>> >>> apologies in advance, this is going to be a long email. >>> >>> >>> I’ve been holding this back intentionally in order to be able to focus on >>> shipping 2.0, but now that that’s out, I feel we should talk about what’s >>> next. >>> >>> This email is separated into areas of work that I think CouchDB could >>> improve on, some with very concrete plans, some with rather vague ideas. >>> I’ve been collecting these over the past year or <strike>two</strike>five, >>> so it’s fairly wide, but I’m sure I’m missing things that other people find >>> important, so please add to this list. >>> >>> After the initial discussion here, I’ll move all of the individual issues >>> to JIRA, so we can go down our usual process. >>> >>> This is basically my wish list, and I’d like this to become everyone’s wish >>> list, so please add what I’ve been missing. :) — Note, this isn’t a >>> free-for-all, only suggest things that you are prepared to see through >>> being shipped, from design, implementation to docs. >>> >>> I don’t have a specific order for these in mind, although I have a rough >>> idea of what we should be doing first. Putting all of this on a roadmap is >>> going to be a fun future exercise for us, though :) >>> >>> One last note: this doesn’t include anything on documentation or testing. I >>> fully expect to step our game from here on out. This list is for the >>> technical aspects of the project. >>> >>> * * * >>> >>> These are the areas of work I’ve roughly come up with that my suggestions >>> fit into: >>> >>> - API >>> - Storage >>> - Query >>> - Replication >>> - Cluster >>> - Fauxton >>> - Releases >>> - Performance >>> - Internals >>> - Builds >>> - Features >>> >>> (I’m not claiming these are any good, but it’s what I’ve got) >>> >>> >>> Let’s go. >>> >>> >>> * * * >>> >>> # API >>> >>> ## HTTP2 >>> >>> I think this is an obvious first next step. Our HTTP Layer needs work, our >>> existing HTTP server library is not getting HTTP2 support, it’s time to >>> attack this head-first. I’m imagining a Cowboy[1]-based HTTP layer that >>> calls into a unified internals layer and everything will be rose-golden. >>> HTTP2 support for Cowboy is still in progress. Maybe we can help them >>> along, or we focus on the internals refactor first and drop Cowboy in later >>> (not sure how feasible this approach is, but we’ll figure this out. >>> >>> In my head, we focus on this and call the result 3.0 in 6-12 months. That >>> doesn’t mean we *only* do this, but this will be the focus (more on this >>> later). >>> >>> There are a few fun considerations, mainly of the “avoid Python >>> 2/3-chasm”-type. Do we re-implement the 2.0 API with all its >>> idiosyncrasies, or do we take the opportunity to clean things up while we >>> are at it? If yes, how and how long do we support the then old API? Do we >>> manage this via different ports? If yes, how can this me made to work for >>> hosting services like Cloudant? Etc. etc. >>> >>> [1] https://github.com/ninenines/cowboy >>> >>> >>> ## Sub-Document Operations >>> >>> Currently a doc update needs the whole doc body sent to the server. There >>> are some obvious performance improvements possible. For the longest time, I >>> wanted to see if we can model sub-document operations via JSON Pointers[2]. >>> These would roughly allow pointing to a JSON value via a URL. >>> >>> For example in this doc: >>> >>> { >>> "_id": "123abc", >>> "_rev": "zyx987", >>> "contact": { >>> "name": "", >>> "address": { >>> "street": "Long Street", >>> "nr": 123 >>> "zip": "12345" >>> } >>> } >>> >>> An update to the zip code could look like this: >>> >>> curl -X POST $SERVER/db/123abc/_jsonpointer/contact/address/zip?rev=zyx987 >>> -d '54321' >>> >>> GET/DELETE accordingly. We could shortcut the `_jsonpointer` to just `_` if >>> we like the short magic. >>> >>> JSONPointer can deal with nested objects and lists and works fairly well >>> for this type of stuff, and it is rather simple to implement (even I could >>> do it: >>> https://github.com/janl/erl-jsonpointer/blob/master/src/jsonpointer.erl — >>> This idea is literally 5 years old, it looks like, no need to use my code >>> if there is anything better). >>> >>> This is just a raw idea, and I’m happy to solve this any other way, if >>> somebody has a good approach. >>> >>> [2] https://tools.ietf.org/html/rfc6901 >>> >>> >>> ## HTTP PATCH / JSON Diff >>> >>> Another stab at a similar problem are HTTP PATCH with JSON Diff, but with >>> the inherent problems of JSON normalisation, I’m leaning towards the >>> JSONPointer variant as simpler, but I’d be open for this as well, if >>> someone comes up with a good approach. >>> >>> >>> ## GraphQL[3] >>> >>> It’s rather new, but getting good traction[4]. This would be a nice >>> addition to our API. Somebody might already be hacking on this ;) >>> >>> [3]: http://graphql.org >>> [4]: http://githubengineering.com/the-github-graphql-api/ >>> >>> >>> ## Mango for Document Validation >>> >>> The only place where we absolutely require writing JS is >>> validate_doc_update functions. Some security behaviour can only be enforced >>> there. With their inherent performance problems, I’d like to get doc >>> validations out of the path of the query server and would love to find a >>> way to validate document updates through Mango. >>> >>> >>> ## Redesign Security System >>> >>> Our security system is slowly grown and not coherently designed. We should >>> start over. I have many ideas and opinions, but they are out of scope for >>> this. I think everybody here agrees that we can do better. This *very >>> likely* will *not* include per-document ACLs as per the often stated issues >>> with that approach in our data model. >>> >>> * * * >>> >>> >>> # Replication >>> >>> This is our flagship feature of course, and there are a few things we can >>> do better. >>> >>> >>> ## Mobile-optimised extension or new version of the protocol >>> >>> The original protocol design didn’t take mobile devices into account and >>> through PouchDB et.al. we are now learning that there are number of >>> downsides to our protocol. We’ve helped a lot with introducing >>> _bulk_get/_revs, but that’s more a bandaid than a considered strategy ;) >>> >>> That new version could also be HTTP2-only, to take advantage of the new >>> connection semantics there. >>> >>> >>> ## Easy way to skip deletes on sync >>> >>> This one is self-explanatory, mobile clients usually don’t need to sync >>> deletes from a year ago first. Mango filters might already get us there, >>> maybe we can do better. >>> >>> >>> ## Sync a rolling subset >>> >>> Say you always want to keep the last 90 days of email on a mobile device >>> with optionally back-loading older documents on user-request. It is >>> something I could see getting a lot of traction. >>> >>> Today, this can be built on 1.x with clever use of _purge, but that’s >>> hardly a good experience. I don’t know if it can be done in a cluster. >>> >>> >>> ## Selective Sync >>> >>> There might be other criteria than “last 90 days”, so the more general >>> solution to this problem class would be arbitrary (e.g. client-directed) >>> selective sync, but this might be really hard as opposed to just very hard >>> of the “last 90 days” one, so happy to punt on this first. But filters are >>> generally not the answer, especially with large data sets. Maybe proper >>> sync from views _changes is the answer. >>> >>> >>> ## A _db_updates powered _replicator DB >>> >>> Running thousands+ of replications on a server is not really resource >>> friendly today, we should teach the replicator to only run replication on >>> active databases via _db_updates. Somebody might already be looking into >>> this one. >>> >>> * * * >>> >>> >>> # Storage >>> >>> >>> ## Pluggable Storage Engines >>> >>> Paul Davis already showed some work on allowing multiple different storage >>> backends. I’d like to see this land. >>> >>> ## Different Storage Backends >>> >>> These don’t all have to be supported by the main project, but I’d really >>> like to see some experimentation with different backends like >>> LevelDB[5]/RocksDB[6], InnoDB[7], SQLite[8] a native-erlang one that is >>> optimised for space usage and not performance (I don’t want to budge on >>> safety). Similarly, it’d be fun to see if there is a compression format >>> that we can use as a storage backend directly, so we get full-DB >>> compression as opposed to just per-doc compression. >>> >>> [5]: http://leveldb.org >>> [6]: http://rocksdb.org >>> [7]: https://en.wikipedia.org/wiki/InnoDB >>> [8]: https://www.sqlite.org >>> >>> * * * >>> >>> >>> # Query >>> >>> ## Teach Mango JOINs and result sorting >>> >>> It’s the natural path for query languages. We should make these happen. >>> Once we have the basics, we might even be able to find a way to compile >>> basic SQL into Mango, it’s going to be glorious :) >>> >>> >>> ## “No-JavaScript”-mode >>> >>> I’ve hinted at this above, but I’d really like a way for users to use >>> CouchDB productively without having to write a line of JavaScript. My main >>> motivation is the poor performance characteristics of the Query Server >>> (hello CGI[9]?). But even with one that is improved, it will always faster >>> to do any, say filtering or validation operations in native Erlang. I don’t >>> know if we can expand Mango to cover all this, and I’m not really concerned >>> about the specifics, as long as we get there. >>> >>> Of course, for pro-users, the JS-variant will still be around. >>> >>> [9]: https://en.wikipedia.org/wiki/Common_Gateway_Interface >>> >>> >>> ## Query Server V2 >>> >>> We need to revamp the Query Server. It is hardcoded to an out-of-date >>> version of SpiderMonkey and we are stuck with C-bindings that barely anyone >>> dares to look at, let alone iterate on. >>> >>> I believe the way forward is re-vamping the query server protocol to use >>> streaming IO instead of blocking batches like we do now, and use JS-native >>> implementation of the JS-side instead of C-bindings. >>> >>> I’m partial to doing this straight in Node, because there is a ton of >>> support for things we need already, and I believe we’ve solved the >>> isolation issues required for secure MapReduce, but I’m happy to use any >>> other thing as well, if it helps. >>> >>> Other benefits would be support for emerging JS features that devs will >>> want to use. >>> >>> And we can have two modes: standalone QS like now, and embedded QS where, >>> say, V8 is compiled into the Erlang VM. Not everybody will want to run >>> this, but it’ll be neat for those who do. >>> >>> >>> * * * >>> >>> >>> # Cluster >>> >>> ## Rebalancing >>> >>> With this we will be able to grow clusters one by one instead of hitting a >>> wall when eventually each shard lives on a single machine. E.g. when you >>> add a node to the cluster, all other nodes share 1/Nth of their data with >>> the new node, and everything can keep going. Same for removing a node and >>> shrinking the cluster. >>> >>> Couchbase has this and it is really nice. >>> >>> >>> ## Setup >>> >>> Even without rebalancing, we need a nice Fauxton UI to manage the cluster, >>> so far we only have a simple setup procedure (which is great don’t get me >>> wrong), but users will want to do more elaborate cluster management and we >>> should make that easy with a slick UI. >>> >>> >>> ## Cluster-Aware Clients >>> >>> This might end up being not a good idea, but I’d like some experimentation >>> here. Say you’d have a CouchDB client that could be hooked into the cluster >>> topology so it’d know which nodes to query for which data, then we can save >>> a proxy-hop, and build clients that have lower-latency access to CouchDB. >>> Again, this is something that Couchbase does and I think is worth exploring. >>> >>> >>> >>> * * * >>> >>> >>> # Fauxton >>> >>> Fauxton is great, but it could be better too, I think. I’m mostly concerned >>> about number of clicks/taps required for more specialised actions (like >>> setting the group_level of a reduce query, it’s like 15 or so). More >>> cluster info would also be nice, and maybe a specialised dashboard for >>> db-per-user setups. >>> >>> >>> * * * >>> >>> >>> # Releases >>> >>> >>> ## Six-Week Release Trains >>> >>> We need to get back to frequent releases and I propose to go back to our >>> six-week-release train plans from three years ago. Whatever lands within a >>> release train time frame goes out. The nature of the change dictates the >>> version number increment as per semver, and we just ship a new version >>> every six weeks, even if it only includes a single bug fix. We should >>> automate most of this infrastructure, so actual releases are cheap. We are >>> reasonably close with this, but we need some more folks to step up on using >>> and maintaining our CI systems. >>> >>> >>> ## One major feature per major version >>> >>> I also propose to keep the scope of future major versions small, so we >>> don’t have to wait another 3-5 years for 3.0. In particular, I think we >>> should focus on a single major feature per major version and get that >>> shipped within 6-12 months tops. If anything needs more time, it needs to >>> be broken up. Of course we continue to add features and fix things while >>> this happens, but as a project, there is *one* major feature we push. For >>> example, for 3.0 I see our push be behind HTTP2 support. There is a lot of >>> subsequent work required to make that happen, so it’ll be a worthwhile 3.0, >>> but we can ship it in 6-12 months (hopefully). >>> >>> Best case scenario, we have CouchDB 4.0 coming out 12 months from now with >>> two new major features. That would be amazing. >>> >>> >>> * * * >>> >>> >>> # Performance >>> >>> ## Perf Team >>> >>> We need a team to comprehensive look at CouchDB performance. There is a lot >>> of low-hanging fruit like Robert Kowalski showed a while back, we should >>> get back into this. I’m mostly inspired by SQLite who’ve done a release a >>> while back that only focussed on 1-2% performance improvements, but got >>> like 20-30 of those and made the thing a lot faster across the board. I >>> can’t remember where I read about this, but I’ll update this once I find >>> the link. >>> >>> >>> ## Benchmark Suite >>> >>> We need a benchmark suite that tests a variety of different work loads. The >>> goal here is to run different versions of CouchDB against the same suite on >>> the same hardware, to see where are going. I’m imagining a >>> http://arewefastyet.com style dashboard where we can track this, and even >>> run this on Pull Requests and not allow them if they significantly impact >>> performance. >>> >>> >>> ## Synthetic Load Suite >>> >>> This one is for end users. I’d like to be able to say: My app produces >>> mostly 10-20kb-sized docs, but millions of those in a single database, or >>> across 1000s of databases, with these views etc. and then run this on >>> target hardware so I’d know, e.g. how many nodes I need for a cluster with >>> my estimated workload. I know this can only be done in approximation, but I >>> think this could make a big difference in CouchDB adoption and feed back >>> into Perf Team mentioned above. >>> >>> * * * >>> >>> >>> # Internals >>> >>> ## Consolidate Repositories >>> >>> With 2.0 we started to experiment with radically small modules for our >>> components and I think we’ve come to the conclusion that some consolidation >>> is better for us going forward. Obvious candidates for separate repos are >>> docs, Fauxton etc. but also some of the Erlang modules that other projects >>> reasonably would use. >>> >>> >>> ## Elixir >>> >>> I’d like it very much if we elevate Elixir as a prime target language for >>> writing CouchDB internals. I believe this would get us an influx of new >>> developers that we badly need to get all the things I’m listing here done. >>> Somebody might be looking into the technical aspects of this already, but >>> we need to decide as a project if we are okay with that. >>> >>> >>> ## GitHub Issues >>> >>> I hope we can transition to GitHub Issues soon. >>> >>> * * * >>> >>> >>> # Builds >>> >>> I’d like automated builds for source, Docker et.al., rpm, deb, brew, ports, >>> Mac Binary, etc with proper release channels for people to subscribe to, >>> all powered by CI for nightly builds, so people can test in-development >>> versions easily. >>> >>> I’d also like builds that include popular community plugins like Geo or >>> Fulltext Search. >>> >>> >>> >>> * * * >>> >>> >>> # Features >>> >>> ## Better Support for db-per-user >>> >>> I don’t know what this will look like, but this is a pattern, and we need >>> to support it better. >>> >>> One approach could be “virtual dbs” that are backed by a single database, >>> but that’s usually at odds with views, so we could make this an XOR and >>> disable views on these dbs. Since this usually powers client-heavy apps, >>> querying usually happens there anyway. >>> >>> Another approach would be better / easier cross-db aggregation or querying. >>> There are a few approaches, but nothing really slick. >>> >>> >>> ## Schema Extraction >>> >>> I have half an (old) patch that extracts top level fields from a document >>> and stores them with a hash in an “attachment” to the database header. So >>> we only end up storing doc values and the schema hash. First of all this >>> trades storage for CPU time (I haven’t measured anything yet), but more >>> interestingly, we could use that schema data to do smart things like >>> auto-generating a validation function / mango expression based on the data >>> that is already in the database. And other fun things like easier schema >>> migration operations that are native in CouchDB and thus a lot faster than >>> external ones. For the curious ones, I’ve got the idea from V8’s property >>> access optimisation strategy[10]. >>> >>> [10]: https://github.com/v8/v8/wiki/Design%20Elements#fast-property-access >>> >>> * * * >>> >>> Alright, that’s it for now. Can’t wait for your feedback! >>> >>> Best >>> Jan >>> -- >>> Professional Support for Apache CouchDB: >>> https://neighbourhood.ie/couchdb-support/ >>> >
