Re: CouchDB Next

Paul Davis Wed, 28 Sep 2016 12:22:56 -0700

I think that's slightly different, although related. It seems like a
separate feature where we could have a ddoc with a "drop during
compaction" function/mango query thing that would be more useful
there.


On Wed, Sep 28, 2016 at 12:48 PM, Adam Kocoloski <[email protected]> wrote:
> Cool. I think we can merge this topic with the “Tombstone Curation” topic I 
> posted.
>
> Adam
>
>
>> On Sep 28, 2016, at 1:04 PM, Paul Davis <[email protected]> wrote:
>>
>> Thanks for the write up, Jan! I've only got one major change to add
>> and its a bit of a doozy.
>>
>> # Update our revision model from a tree to a graph
>>
>> As a bit of background, our current revision model is a standard tree.
>> The biggest issue we've seen customers have with using CouchDB
>> "normally" is when they have a work load that generates conflicts with
>> regularity. This ends up creating revision trees with many thousands
>> of revisions with no general bound on growth of the tree. Eventually a
>> single document can take many seconds to minutes to update as the tree
>> has to be read from disk, updated, and then written back to disk. The
>> only effective solution to this currently is to have an operator
>> manually purge revisions from the document.
>>
>> The best solution I've come across to this issue would be to change
>> our revision model to be a graph. Then instead of resolving conflicts
>> by deleting a revision, the conflict is resolved by making an update
>> that references two or more revisions. In this way a customer that
>> generates a large number of conflicts can easily resolve the situation
>> during their normal conflict resolution process. (Our stemming would
>> change from keeping $revs_limit revisions for each leaf to doing a
>> breadth first search and keeping all revisions for the depth that
>> contains $revs_limit revisions or something.)
>>
>> While this approach is fairly straight forward in theory, the
>> difficult part is how we'd want to handle backwards compatibility with
>> replication. So far I could see having a replicator that could
>> translate new revision graphs to old revision trees by "undoing" merge
>> changes and created the equivalent of the old "deleted revision"
>> logic. However, I don't see a way that we could go from the old format
>> back to the graph (ie, think about replicating a new style graph doc
>> through an old CouchDB version back to a new graph doc and end up with
>> the same doc).
>>
>> Obviously, this is a long term goal but I do think we should start
>> thinking about this and possibly making transition plans long term
>> (assuming everyone thinks this is a good idea).
>>
>> A couple other notes:
>>
>> For the HTTP API upgrades, we should look at this as a step in
>> refactoring the logic quite a bit and working for clean interfaces
>> internally. If we include that as part of the work then having
>> HTTP/HTTP2/WebSockets/Whatevers interfaces available would become much
>> easier. This then enables follow on features like replication over
>> WebSockets or even easier integration work as Koco suggests. I also
>> agree that this is probably our highest priority major feature.
>>
>> My second highest priority feature would then be the pluggable storage
>> engine work. I believe the current is solid and is minimally invasive.
>> Mayya Sharipova has also been doing some work at [1] using it to
>> enable an improved purge that will hopefully allow us to provide a
>> purge operation at the cluster level (as a bandaid for the revision
>> tree issues I mentioned above). I'd love to get this in and start
>> having other people hacking on alternative storage engine
>> implementations so we can refine the APIs even further.
>>
>> Lastly, for the smart cluster clients, the work I did for COUCHDB-2791 [2]
>> already implements a bit of that. There's definitely more to add here
>> to flesh it out but the surprisingly simple implementation makes me
>> think that it'd fit in quite nicely with the HTTP refactoring work
>> from above. I'm currently working on improving our API tests in Erlang
>> so that I can eventually turn those branches into PRs.
>>
>> [1] https://github.com/cloudant/couchdb-couch/commits/68275_cluster_purge
>> [2] https://issues.apache.org/jira/browse/COUCHDB-2791
>>
>> On Tue, Sep 27, 2016 at 7:56 AM, Jan Lehnardt <[email protected]> wrote:
>>> Hi all,
>>>
>>> apologies in advance, this is going to be a long email.
>>>
>>>
>>> I’ve been holding this back intentionally in order to be able to focus on 
>>> shipping 2.0, but now that that’s out, I feel we should talk about what’s 
>>> next.
>>>
>>> This email is separated into areas of work that I think CouchDB could 
>>> improve on, some with very concrete plans, some with rather vague ideas. 
>>> I’ve been collecting these over the past year or <strike>two</strike>five, 
>>> so it’s fairly wide, but I’m sure I’m missing things that other people find 
>>> important, so please add to this list.
>>>
>>> After the initial discussion here, I’ll move all of the individual issues 
>>> to JIRA, so we can go down our usual process.
>>>
>>> This is basically my wish list, and I’d like this to become everyone’s wish 
>>> list, so please add what I’ve been missing. :) — Note, this isn’t a 
>>> free-for-all, only suggest things that you are prepared to see through 
>>> being shipped, from design, implementation to docs.
>>>
>>> I don’t have a specific order for these in mind, although I have a rough 
>>> idea of what we should be doing first. Putting all of this on a roadmap is 
>>> going to be a fun future exercise for us, though :)
>>>
>>> One last note: this doesn’t include anything on documentation or testing. I 
>>> fully expect to step our game from here on out. This list is for the 
>>> technical aspects of the project.
>>>
>>> * * *
>>>
>>> These are the areas of work I’ve roughly come up with that my suggestions 
>>> fit into:
>>>
>>> - API
>>> - Storage
>>> - Query
>>> - Replication
>>> - Cluster
>>> - Fauxton
>>> - Releases
>>> - Performance
>>> - Internals
>>> - Builds
>>> - Features
>>>
>>> (I’m not claiming these are any good, but it’s what I’ve got)
>>>
>>>
>>> Let’s go.
>>>
>>>
>>> * * *
>>>
>>> # API
>>>
>>> ## HTTP2
>>>
>>> I think this is an obvious first next step. Our HTTP Layer needs work, our 
>>> existing HTTP server library is not getting HTTP2 support, it’s time to 
>>> attack this head-first. I’m imagining a Cowboy[1]-based HTTP layer that 
>>> calls into a unified internals layer and everything will be rose-golden. 
>>> HTTP2 support for Cowboy is still in progress. Maybe we can help them 
>>> along, or we focus on the internals refactor first and drop Cowboy in later 
>>> (not sure how feasible this approach is, but we’ll figure this out.
>>>
>>> In my head, we focus on this and call the result 3.0 in 6-12 months. That 
>>> doesn’t mean we *only* do this, but this will be the focus (more on this 
>>> later).
>>>
>>> There are a few fun considerations, mainly of the “avoid Python 
>>> 2/3-chasm”-type. Do we re-implement the 2.0 API with all its 
>>> idiosyncrasies, or do we take the opportunity to clean things up while we 
>>> are at it? If yes, how and how long do we support the then old API? Do we 
>>> manage this via different ports? If yes, how can this me made to work for 
>>> hosting services like Cloudant? Etc. etc.
>>>
>>> [1] https://github.com/ninenines/cowboy
>>>
>>>
>>> ## Sub-Document Operations
>>>
>>> Currently a doc update needs the whole doc body sent to the server. There 
>>> are some obvious performance improvements possible. For the longest time, I 
>>> wanted to see if we can model sub-document operations via JSON Pointers[2]. 
>>> These would roughly allow pointing to a JSON value via a URL.
>>>
>>> For example in this doc:
>>>
>>> {
>>>  "_id": "123abc",
>>>  "_rev": "zyx987",
>>>  "contact": {
>>>    "name": "",
>>>    "address": {
>>>      "street": "Long Street",
>>>      "nr": 123
>>>      "zip": "12345"
>>>    }
>>> }
>>>
>>> An update to the zip code could look like this:
>>>
>>> curl -X POST $SERVER/db/123abc/_jsonpointer/contact/address/zip?rev=zyx987 
>>> -d '54321'
>>>
>>> GET/DELETE accordingly. We could shortcut the `_jsonpointer` to just `_` if 
>>> we like the short magic.
>>>
>>> JSONPointer can deal with nested objects and lists and works fairly well 
>>> for this type of stuff, and it is rather simple to implement (even I could 
>>> do it: 
>>> https://github.com/janl/erl-jsonpointer/blob/master/src/jsonpointer.erl — 
>>> This idea is literally 5 years old, it looks like, no need to use my code 
>>> if there is anything better).
>>>
>>> This is just a raw idea, and I’m happy to solve this any other way, if 
>>> somebody has a good approach.
>>>
>>> [2] https://tools.ietf.org/html/rfc6901
>>>
>>>
>>> ## HTTP PATCH / JSON Diff
>>>
>>> Another stab at a similar problem are HTTP PATCH with JSON Diff, but with 
>>> the inherent problems of JSON normalisation, I’m leaning towards the 
>>> JSONPointer variant as simpler, but I’d be open for this as well, if 
>>> someone comes up with a good approach.
>>>
>>>
>>> ## GraphQL[3]
>>>
>>> It’s rather new, but getting good traction[4]. This would be a nice 
>>> addition to our API. Somebody might already be hacking on this ;)
>>>
>>> [3]: http://graphql.org
>>> [4]: http://githubengineering.com/the-github-graphql-api/
>>>
>>>
>>> ## Mango for Document Validation
>>>
>>> The only place where we absolutely require writing JS is 
>>> validate_doc_update functions. Some security behaviour can only be enforced 
>>> there. With their inherent performance problems, I’d like to get doc 
>>> validations out of the path of the query server and would love to find a 
>>> way to validate document updates through Mango.
>>>
>>>
>>> ## Redesign Security System
>>>
>>> Our security system is slowly grown and not coherently designed. We should 
>>> start over. I have many ideas and opinions, but they are out of scope for 
>>> this. I think everybody here agrees that we can do better. This *very 
>>> likely* will *not* include per-document ACLs as per the often stated issues 
>>> with that approach in our data model.
>>>
>>> * * *
>>>
>>>
>>> # Replication
>>>
>>> This is our flagship feature of course, and there are a few things we can 
>>> do better.
>>>
>>>
>>> ## Mobile-optimised extension or new version of the protocol
>>>
>>> The original protocol design didn’t take mobile devices into account and 
>>> through PouchDB et.al. we are now learning that there are number of 
>>> downsides to our protocol. We’ve helped a lot with introducing 
>>> _bulk_get/_revs, but that’s more a bandaid than a considered strategy ;)
>>>
>>> That new version could also be HTTP2-only, to take advantage of the new 
>>> connection semantics there.
>>>
>>>
>>> ## Easy way to skip deletes on sync
>>>
>>> This one is self-explanatory, mobile clients usually don’t need to sync 
>>> deletes from a year ago first. Mango filters might already get us there, 
>>> maybe we can do better.
>>>
>>>
>>> ## Sync a rolling subset
>>>
>>> Say you always want to keep the last 90 days of email on a mobile device 
>>> with optionally back-loading older documents on user-request. It is 
>>> something I could see getting a lot of traction.
>>>
>>> Today, this can be built on 1.x with clever use of _purge, but that’s 
>>> hardly a good experience. I don’t know if it can be done in a cluster.
>>>
>>>
>>> ## Selective Sync
>>>
>>> There might be other criteria than “last 90 days”, so the more general 
>>> solution to this problem class would be arbitrary (e.g. client-directed) 
>>> selective sync, but this might be really hard as opposed to just very hard 
>>> of the “last 90 days” one, so happy to punt on this first. But filters are 
>>> generally not the answer, especially with large data sets. Maybe proper 
>>> sync from views _changes is the answer.
>>>
>>>
>>> ## A _db_updates powered _replicator DB
>>>
>>> Running thousands+ of replications on a server is not really resource 
>>> friendly today, we should teach the replicator to only run replication on 
>>> active databases via _db_updates. Somebody might already be looking into 
>>> this one.
>>>
>>> * * *
>>>
>>>
>>> # Storage
>>>
>>>
>>> ## Pluggable Storage Engines
>>>
>>> Paul Davis already showed some work on allowing multiple different storage 
>>> backends. I’d like to see this land.
>>>
>>> ## Different Storage Backends
>>>
>>> These don’t all have to be supported by the main project, but I’d really 
>>> like to see some experimentation with different backends like 
>>> LevelDB[5]/RocksDB[6], InnoDB[7], SQLite[8] a native-erlang one that is 
>>> optimised for space usage and not performance (I don’t want to budge on 
>>> safety). Similarly, it’d be fun to see if there is a compression format 
>>> that we can use as a storage backend directly, so we get full-DB 
>>> compression as opposed to just per-doc compression.
>>>
>>> [5]: http://leveldb.org
>>> [6]: http://rocksdb.org
>>> [7]: https://en.wikipedia.org/wiki/InnoDB
>>> [8]: https://www.sqlite.org
>>>
>>> * * *
>>>
>>>
>>> # Query
>>>
>>> ## Teach Mango JOINs and result sorting
>>>
>>> It’s the natural path for query languages. We should make these happen. 
>>> Once we have the basics, we might even be able to find a way to compile 
>>> basic SQL into Mango, it’s going to be glorious :)
>>>
>>>
>>> ## “No-JavaScript”-mode
>>>
>>> I’ve hinted at this above, but I’d really like a way for users to use 
>>> CouchDB productively without having to write a line of JavaScript. My main 
>>> motivation is the poor performance characteristics of the Query Server 
>>> (hello CGI[9]?). But even with one that is improved, it will always faster 
>>> to do any, say filtering or validation operations in native Erlang. I don’t 
>>> know if we can expand Mango to cover all this, and I’m not really concerned 
>>> about the specifics, as long as we get there.
>>>
>>> Of course, for pro-users, the JS-variant will still be around.
>>>
>>> [9]: https://en.wikipedia.org/wiki/Common_Gateway_Interface
>>>
>>>
>>> ## Query Server V2
>>>
>>> We need to revamp the Query Server. It is hardcoded to an out-of-date 
>>> version of SpiderMonkey and we are stuck with C-bindings that barely anyone 
>>> dares to look at, let alone iterate on.
>>>
>>> I believe the way forward is re-vamping the query server protocol to use 
>>> streaming IO instead of blocking batches like we do now, and use JS-native 
>>> implementation of the JS-side instead of C-bindings.
>>>
>>> I’m partial to doing this straight in Node, because there is a ton of 
>>> support for things we need already, and I believe we’ve solved the 
>>> isolation issues required for secure MapReduce, but I’m happy to use any 
>>> other thing as well, if it helps.
>>>
>>> Other benefits would be support for emerging JS features that devs will 
>>> want to use.
>>>
>>> And we can have two modes: standalone QS like now, and embedded QS where, 
>>> say, V8 is compiled into the Erlang VM. Not everybody will want to run 
>>> this, but it’ll be neat for those who do.
>>>
>>>
>>> * * *
>>>
>>>
>>> # Cluster
>>>
>>> ## Rebalancing
>>>
>>> With this we will be able to grow clusters one by one instead of hitting a 
>>> wall when eventually each shard lives on a single machine. E.g. when you 
>>> add a node to the cluster, all other nodes share 1/Nth of their data with 
>>> the new node, and everything can keep going. Same for removing a node and 
>>> shrinking the cluster.
>>>
>>> Couchbase has this and it is really nice.
>>>
>>>
>>> ## Setup
>>>
>>> Even without rebalancing, we need a nice Fauxton UI to manage the cluster, 
>>> so far we only have a simple setup procedure (which is great don’t get me 
>>> wrong), but users will want to do more elaborate cluster management and we 
>>> should make that easy with a slick UI.
>>>
>>>
>>> ## Cluster-Aware Clients
>>>
>>> This might end up being not a good idea, but I’d like some experimentation 
>>> here. Say you’d have a CouchDB client that could be hooked into the cluster 
>>> topology so it’d know which nodes to query for which data, then we can save 
>>> a proxy-hop, and build clients that have lower-latency access to CouchDB. 
>>> Again, this is something that Couchbase does and I think is worth exploring.
>>>
>>>
>>>
>>> * * *
>>>
>>>
>>> # Fauxton
>>>
>>> Fauxton is great, but it could be better too, I think. I’m mostly concerned 
>>> about number of clicks/taps required for more specialised actions (like 
>>> setting the group_level of a reduce query, it’s like 15 or so). More 
>>> cluster info would also be nice, and maybe a specialised dashboard for 
>>> db-per-user setups.
>>>
>>>
>>> * * *
>>>
>>>
>>> # Releases
>>>
>>>
>>> ## Six-Week Release Trains
>>>
>>> We need to get back to frequent releases and I propose to go back to our 
>>> six-week-release train plans from three years ago. Whatever lands within a 
>>> release train time frame goes out. The nature of the change dictates the 
>>> version number increment as per semver, and we just ship a new version 
>>> every six weeks, even if it only includes a single bug fix. We should 
>>> automate most of this infrastructure, so actual releases are cheap. We are 
>>> reasonably close with this, but we need some more folks to step up on using 
>>> and maintaining our CI systems.
>>>
>>>
>>> ## One major feature per major version
>>>
>>> I also propose to keep the scope of future major versions small, so we 
>>> don’t have to wait another 3-5 years for 3.0. In particular, I think we 
>>> should focus on a single major feature per major version and get that 
>>> shipped within 6-12 months tops. If anything needs more time, it needs to 
>>> be broken up. Of course we continue to add features and fix things while 
>>> this happens, but as a project, there is *one* major feature we push. For 
>>> example, for 3.0 I see our push be behind HTTP2 support. There is a lot of 
>>> subsequent work required to make that happen, so it’ll be a worthwhile 3.0, 
>>> but we can ship it in 6-12 months (hopefully).
>>>
>>> Best case scenario, we have CouchDB 4.0 coming out 12 months from now with 
>>> two new major features. That would be amazing.
>>>
>>>
>>> * * *
>>>
>>>
>>> # Performance
>>>
>>> ## Perf Team
>>>
>>> We need a team to comprehensive look at CouchDB performance. There is a lot 
>>> of low-hanging fruit like Robert Kowalski showed a while back, we should 
>>> get back into this. I’m mostly inspired by SQLite who’ve done a release a 
>>> while back that only focussed on 1-2% performance improvements, but got 
>>> like 20-30 of those and made the thing a lot faster across the board. I 
>>> can’t remember where I read about this, but I’ll update this once I find 
>>> the link.
>>>
>>>
>>> ## Benchmark Suite
>>>
>>> We need a benchmark suite that tests a variety of different work loads. The 
>>> goal here is to run different versions of CouchDB against the same suite on 
>>> the same hardware, to see where are going. I’m imagining a 
>>> http://arewefastyet.com style dashboard where we can track this, and even 
>>> run this on Pull Requests and not allow them if they significantly impact 
>>> performance.
>>>
>>>
>>> ## Synthetic Load Suite
>>>
>>> This one is for end users. I’d like to be able to say: My app produces 
>>> mostly 10-20kb-sized docs, but millions of those in a single database, or 
>>> across 1000s of databases, with these views etc. and then run this on 
>>> target hardware so I’d know, e.g. how many nodes I need for a cluster with 
>>> my estimated workload. I know this can only be done in approximation, but I 
>>> think this could make a big difference in CouchDB adoption and feed back 
>>> into Perf Team mentioned above.
>>>
>>> * * *
>>>
>>>
>>> # Internals
>>>
>>> ## Consolidate Repositories
>>>
>>> With 2.0 we started to experiment with radically small modules for our 
>>> components and I think we’ve come to the conclusion that some consolidation 
>>> is better for us going forward. Obvious candidates for separate repos are 
>>> docs, Fauxton etc. but also some of the Erlang modules that other projects 
>>> reasonably would use.
>>>
>>>
>>> ## Elixir
>>>
>>> I’d like it very much if we elevate Elixir as a prime target language for 
>>> writing CouchDB internals. I believe this would get us an influx of new 
>>> developers that we badly need to get all the things I’m listing here done. 
>>> Somebody might be looking into the technical aspects of this already, but 
>>> we need to decide as a project if we are okay with that.
>>>
>>>
>>> ## GitHub Issues
>>>
>>> I hope we can transition to GitHub Issues soon.
>>>
>>> * * *
>>>
>>>
>>> # Builds
>>>
>>> I’d like automated builds for source, Docker et.al., rpm, deb, brew, ports, 
>>> Mac Binary, etc with proper release channels for people to subscribe to, 
>>> all powered by CI for nightly builds, so people can test in-development 
>>> versions easily.
>>>
>>> I’d also like builds that include popular community plugins like Geo or 
>>> Fulltext Search.
>>>
>>>
>>>
>>> * * *
>>>
>>>
>>> # Features
>>>
>>> ## Better Support for db-per-user
>>>
>>> I don’t know what this will look like, but this is a pattern, and we need 
>>> to support it better.
>>>
>>> One approach could be “virtual dbs” that are backed by a single database, 
>>> but that’s usually at odds with views, so we could make this an XOR and 
>>> disable views on these dbs. Since this usually powers client-heavy apps, 
>>> querying usually happens there anyway.
>>>
>>> Another approach would be better / easier cross-db aggregation or querying. 
>>> There are a few approaches, but nothing really slick.
>>>
>>>
>>> ## Schema Extraction
>>>
>>> I have half an (old) patch that extracts top level fields from a document 
>>> and stores them with a hash in an “attachment” to the database header. So 
>>> we only end up storing doc values and the schema hash. First of all this 
>>> trades storage for CPU time (I haven’t measured anything yet), but more 
>>> interestingly, we could use that schema data to do smart things like 
>>> auto-generating a validation function / mango expression based on the data 
>>> that is already in the database. And other fun things like easier schema 
>>> migration operations that are native in CouchDB and thus a lot faster than 
>>> external ones. For the curious ones, I’ve got the idea from V8’s property 
>>> access optimisation strategy[10].
>>>
>>> [10]: https://github.com/v8/v8/wiki/Design%20Elements#fast-property-access
>>>
>>> * * *
>>>
>>> Alright, that’s it for now. Can’t wait for your feedback!
>>>
>>> Best
>>> Jan
>>> --
>>> Professional Support for Apache CouchDB:
>>> https://neighbourhood.ie/couchdb-support/
>>>
>

Re: CouchDB Next

Reply via email to