Re: CouchDB Next

Jan Lehnardt Tue, 27 Sep 2016 12:26:09 -0700

James, awesome points, thank you for sharing, would love talk details at some
point :)


> On 27 Sep 2016, at 20:16, Mutton, James <[email protected]> wrote:
> 
> # Query-server side : Iąve done some performance testing on some very/very
> complicated (read łnever should have been written˛) design docs and found
> very little performance gain just between c-bindings and Node.  The
> biggest gain there is in memory management.  Iąve watched the SM view
> server fragment the memory to be unmallocable while a node view server
> happily chugs along to 2-3 times the abuse before crashing, but itąs no
> faster.  The biggest gain is in staying very far away from the double
> serialization/deserialization and the FCGI-like interface and getting it
> native.
> 
> # _jsonpointer (we call it Fields/Filters) : We also ended up doing this a
> while ago but we broke it into 2 tangential APIąs.  First was what we
> called the fields api where youąd do /DB/DOC/_fields/FIELDNAME to retrieve
> a sub-value from that field located with a JSONPath expression.  We also
> tangentially added a ?filter=FILTERNAME option to run the output through a
> native filter-stage that one could plug Erlang modules into.  This lets
> you compose some interesting stuff like send a protobuf as binary
> converted from a document, store some small object value as base64 and
> retrieve it binary without dealing with attachments, or encrypt something
> using an asymmetric key and natively/transparently decrypt or sign it etcŠ
> 
> # Wide-clusters (probably a variant on cluster-aware clients) : Scaling
> out clustered-couch to łvery-large" ops per second is an interesting
> experience.  The basic premise that clusters contain all of the database
> is nice in that as you create copies of your data you get always-close
> behaviors,  but thereąs a limit to individual cluster performance.
> Depending on the load-balancing involved in reaching your database, how
> spread your installation/clients are and much hot-spotting you experience,
> you can much more easily end up in cases where reads/writes have to be
> shunted to another cluster and because of replication-delay apps start to
> see strange behavior.  One thing Iąve considered doing recently is using a
> transparent routing proxy on-top of couch that takes a provisioning
> configuration to locate clusters containing a specific database and use
> consistent-hashing to spread the keyspace transactions into predictable
> buckets, hinting back to the client what cluster-buckets were used.  While
> not necessarily a couch-specific feature, and still subject to its own
> nuances, itąs a useful lesson in scaling for the inevitable limit with 2.0
> installations.
> 
> </JamesM>
> 
> 
> 
> 
> 
> On 9/27/16, 7:57, "Adam Kocoloski" <[email protected]> wrote:
> 
>> Wow, thanks for kicking this off Jan. Lots of good ideas in that list. I
>> have a few additional ideas:
>> 
>> # Containers and Package Management
>> 
>> Deploying an Erlang-based system can still be an unfriendly exercise.
>> Rather than redouble our efforts to play nice with all variants of
>> distro-specific package managers out there, letąs do the following: a)
>> deliver an official Snap package (thanks Michael!) for those who still
>> want Linux packages, and b) plug our Docker image into the popular
>> container orchestration frameworks. I want to see one-touch cluster(able)
>> deployments in Kubernetes, DC/OS Universe, and Docker Compose / DAB.
>> Promote these options as preferred ways to get up and running with
>> CouchDB.
>> 
>> # Tombstone Curation
>> 
>> You touched on this with some of your thoughts on Replication, but Iąd
>> like to investigate ways to excise tombstones safely from existing
>> databases. We know that documents with wide revision trees become very
>> unwieldy, and that the best practices around conflict management do
>> nothing to help address this. Rather than ask the user to go in and
>> manually purge records, can we compute when itąs safe to automatically
>> prune a deleted edit branch?
>> 
>> # Database Archival
>> 
>> Clustering is great and all, but sometimes one just wants to get old data
>> out of the database and into some cheaper storage. Many IoT historian use
>> cases fall into this bucket. This work could take a lot of different
>> forms, from simple whole-database archival to more subtle policy-based
>> archiving within a database.
>> 
>> # Integrations - Object Storage, Kafka, Spark
>> 
>> Most of the pluggable storage engines you mentioned will not be happy
>> about storing large attachments. We started a bit of work to optionally
>> offload those attachments into an object store like S3 or Swift and keep
>> just the metadata in CouchDB; Iąd like to see that through. Iąd also like
>> to establish a stronger linkage with a few of our Apache brethren.
>> Enabling a _changes feed to be published in Kafka (and a Kafka topic to
>> be loaded into a database) will help Couch play in more sophisticated
>> data processing pipelines. On the Spark side weąve already written code
>> that can be used to expose CouchDB as an external datasource, but there
>> are still some significant optimizations that we can apply (in the vein
>> of the cluster-aware clients mentioned below).
>> 
>> Adam
>> 
>>> On Sep 27, 2016, at 5:56 AM, Jan Lehnardt <[email protected]> wrote:
>>> 
>>> Hi all,
>>> 
>>> apologies in advance, this is going to be a long email.
>>> 
>>> 
>>> Iąve been holding this back intentionally in order to be able to focus
>>> on shipping 2.0, but now that thatąs out, I feel we should talk about
>>> whatąs next.
>>> 
>>> This email is separated into areas of work that I think CouchDB could
>>> improve on, some with very concrete plans, some with rather vague ideas.
>>> Iąve been collecting these over the past year or
>>> <strike>two</strike>five, so itąs fairly wide, but Iąm sure Iąm missing
>>> things that other people find important, so please add to this list.
>>> 
>>> After the initial discussion here, Iąll move all of the individual
>>> issues to JIRA, so we can go down our usual process.
>>> 
>>> This is basically my wish list, and Iąd like this to become everyoneąs
>>> wish list, so please add what Iąve been missing. :) ‹ Note, this isnąt a
>>> free-for-all, only suggest things that you are prepared to see through
>>> being shipped, from design, implementation to docs.
>>> 
>>> I donąt have a specific order for these in mind, although I have a
>>> rough idea of what we should be doing first. Putting all of this on a
>>> roadmap is going to be a fun future exercise for us, though :)
>>> 
>>> One last note: this doesnąt include anything on documentation or
>>> testing. I fully expect to step our game from here on out. This list is
>>> for the technical aspects of the project.
>>> 
>>> * * *
>>> 
>>> These are the areas of work Iąve roughly come up with that my
>>> suggestions fit into:
>>> 
>>> - API
>>> - Storage
>>> - Query
>>> - Replication
>>> - Cluster
>>> - Fauxton
>>> - Releases
>>> - Performance
>>> - Internals
>>> - Builds
>>> - Features
>>> 
>>> (Iąm not claiming these are any good, but itąs what Iąve got)
>>> 
>>> 
>>> Letąs go.
>>> 
>>> 
>>> * * *
>>> 
>>> # API
>>> 
>>> ## HTTP2
>>> 
>>> I think this is an obvious first next step. Our HTTP Layer needs work,
>>> our existing HTTP server library is not getting HTTP2 support, itąs time
>>> to attack this head-first. Iąm imagining a Cowboy[1]-based HTTP layer
>>> that calls into a unified internals layer and everything will be
>>> rose-golden. HTTP2 support for Cowboy is still in progress. Maybe we can
>>> help them along, or we focus on the internals refactor first and drop
>>> Cowboy in later (not sure how feasible this approach is, but weąll
>>> figure this out.
>>> 
>>> In my head, we focus on this and call the result 3.0 in 6-12 months.
>>> That doesnąt mean we *only* do this, but this will be the focus (more on
>>> this later).
>>> 
>>> There are a few fun considerations, mainly of the łavoid Python
>>> 2/3-chasm˛-type. Do we re-implement the 2.0 API with all its
>>> idiosyncrasies, or do we take the opportunity to clean things up while
>>> we are at it? If yes, how and how long do we support the then old API?
>>> Do we manage this via different ports? If yes, how can this me made to
>>> work for hosting services like Cloudant? Etc. etc.
>>> 
>>> [1] https://github.com/ninenines/cowboy
>>> 
>>> 
>>> ## Sub-Document Operations
>>> 
>>> Currently a doc update needs the whole doc body sent to the server.
>>> There are some obvious performance improvements possible. For the
>>> longest time, I wanted to see if we can model sub-document operations
>>> via JSON Pointers[2]. These would roughly allow pointing to a JSON value
>>> via a URL.
>>> 
>>> For example in this doc:
>>> 
>>> {
>>> "_id": "123abc",
>>> "_rev": "zyx987",
>>> "contact": {
>>>   "name": "",
>>>   "address": {
>>>     "street": "Long Street",
>>>     "nr": 123
>>>     "zip": "12345"
>>>   }
>>> }
>>> 
>>> An update to the zip code could look like this:
>>> 
>>> curl -X POST
>>> $SERVER/db/123abc/_jsonpointer/contact/address/zip?rev=zyx987 -d '54321'
>>> 
>>> GET/DELETE accordingly. We could shortcut the `_jsonpointer` to just
>>> `_` if we like the short magic.
>>> 
>>> JSONPointer can deal with nested objects and lists and works fairly
>>> well for this type of stuff, and it is rather simple to implement (even
>>> I could do it:
>>> https://github.com/janl/erl-jsonpointer/blob/master/src/jsonpointer.erl
>>> ‹ This idea is literally 5 years old, it looks like, no need to use my
>>> code if there is anything better).
>>> 
>>> This is just a raw idea, and Iąm happy to solve this any other way, if
>>> somebody has a good approach.
>>> 
>>> [2] https://tools.ietf.org/html/rfc6901
>>> 
>>> 
>>> ## HTTP PATCH / JSON Diff
>>> 
>>> Another stab at a similar problem are HTTP PATCH with JSON Diff, but
>>> with the inherent problems of JSON normalisation, Iąm leaning towards
>>> the JSONPointer variant as simpler, but Iąd be open for this as well, if
>>> someone comes up with a good approach.
>>> 
>>> 
>>> ## GraphQL[3]
>>> 
>>> Itąs rather new, but getting good traction[4]. This would be a nice
>>> addition to our API. Somebody might already be hacking on this ;)
>>> 
>>> [3]: http://graphql.org
>>> [4]: http://githubengineering.com/the-github-graphql-api/
>>> 
>>> 
>>> ## Mango for Document Validation
>>> 
>>> The only place where we absolutely require writing JS is
>>> validate_doc_update functions. Some security behaviour can only be
>>> enforced there. With their inherent performance problems, Iąd like to
>>> get doc validations out of the path of the query server and would love
>>> to find a way to validate document updates through Mango.
>>> 
>>> 
>>> ## Redesign Security System
>>> 
>>> Our security system is slowly grown and not coherently designed. We
>>> should start over. I have many ideas and opinions, but they are out of
>>> scope for this. I think everybody here agrees that we can do better.
>>> This *very likely* will *not* include per-document ACLs as per the often
>>> stated issues with that approach in our data model.
>>> 
>>> * * *
>>> 
>>> 
>>> # Replication
>>> 
>>> This is our flagship feature of course, and there are a few things we
>>> can do better.
>>> 
>>> 
>>> ## Mobile-optimised extension or new version of the protocol
>>> 
>>> The original protocol design didnąt take mobile devices into account
>>> and through PouchDB et.al. we are now learning that there are number of
>>> downsides to our protocol. Weąve helped a lot with introducing
>>> _bulk_get/_revs, but thatąs more a bandaid than a considered strategy ;)
>>> 
>>> That new version could also be HTTP2-only, to take advantage of the new
>>> connection semantics there.
>>> 
>>> 
>>> ## Easy way to skip deletes on sync
>>> 
>>> This one is self-explanatory, mobile clients usually donąt need to sync
>>> deletes from a year ago first. Mango filters might already get us there,
>>> maybe we can do better.
>>> 
>>> 
>>> ## Sync a rolling subset
>>> 
>>> Say you always want to keep the last 90 days of email on a mobile
>>> device with optionally back-loading older documents on user-request. It
>>> is something I could see getting a lot of traction.
>>> 
>>> Today, this can be built on 1.x with clever use of _purge, but thatąs
>>> hardly a good experience. I donąt know if it can be done in a cluster.
>>> 
>>> 
>>> ## Selective Sync
>>> 
>>> There might be other criteria than łlast 90 days˛, so the more general
>>> solution to this problem class would be arbitrary (e.g. client-directed)
>>> selective sync, but this might be really hard as opposed to just very
>>> hard of the łlast 90 days˛ one, so happy to punt on this first. But
>>> filters are generally not the answer, especially with large data sets.
>>> Maybe proper sync from views _changes is the answer.
>>> 
>>> 
>>> ## A _db_updates powered _replicator DB
>>> 
>>> Running thousands+ of replications on a server is not really resource
>>> friendly today, we should teach the replicator to only run replication
>>> on active databases via _db_updates. Somebody might already be looking
>>> into this one.
>>> 
>>> * * *
>>> 
>>> 
>>> # Storage
>>> 
>>> 
>>> ## Pluggable Storage Engines
>>> 
>>> Paul Davis already showed some work on allowing multiple different
>>> storage backends. Iąd like to see this land.
>>> 
>>> ## Different Storage Backends
>>> 
>>> These donąt all have to be supported by the main project, but Iąd
>>> really like to see some experimentation with different backends like
>>> LevelDB[5]/RocksDB[6], InnoDB[7], SQLite[8] a native-erlang one that is
>>> optimised for space usage and not performance (I donąt want to budge on
>>> safety). Similarly, itąd be fun to see if there is a compression format
>>> that we can use as a storage backend directly, so we get full-DB
>>> compression as opposed to just per-doc compression.
>>> 
>>> [5]: http://leveldb.org
>>> [6]: http://rocksdb.org
>>> [7]: https://en.wikipedia.org/wiki/InnoDB
>>> [8]: https://www.sqlite.org
>>> 
>>> * * *
>>> 
>>> 
>>> # Query
>>> 
>>> ## Teach Mango JOINs and result sorting
>>> 
>>> Itąs the natural path for query languages. We should make these happen.
>>> Once we have the basics, we might even be able to find a way to compile
>>> basic SQL into Mango, itąs going to be glorious :)
>>> 
>>> 
>>> ## łNo-JavaScript˛-mode
>>> 
>>> Iąve hinted at this above, but Iąd really like a way for users to use
>>> CouchDB productively without having to write a line of JavaScript. My
>>> main motivation is the poor performance characteristics of the Query
>>> Server (hello CGI[9]?). But even with one that is improved, it will
>>> always faster to do any, say filtering or validation operations in
>>> native Erlang. I donąt know if we can expand Mango to cover all this,
>>> and Iąm not really concerned about the specifics, as long as we get
>>> there.
>>> 
>>> Of course, for pro-users, the JS-variant will still be around.
>>> 
>>> [9]: https://en.wikipedia.org/wiki/Common_Gateway_Interface
>>> 
>>> 
>>> ## Query Server V2
>>> 
>>> We need to revamp the Query Server. It is hardcoded to an out-of-date
>>> version of SpiderMonkey and we are stuck with C-bindings that barely
>>> anyone dares to look at, let alone iterate on.
>>> 
>>> I believe the way forward is re-vamping the query server protocol to
>>> use streaming IO instead of blocking batches like we do now, and use
>>> JS-native implementation of the JS-side instead of C-bindings.
>>> 
>>> Iąm partial to doing this straight in Node, because there is a ton of
>>> support for things we need already, and I believe weąve solved the
>>> isolation issues required for secure MapReduce, but Iąm happy to use any
>>> other thing as well, if it helps.
>>> 
>>> Other benefits would be support for emerging JS features that devs will
>>> want to use.
>>> 
>>> And we can have two modes: standalone QS like now, and embedded QS
>>> where, say, V8 is compiled into the Erlang VM. Not everybody will want
>>> to run this, but itąll be neat for those who do.
>>> 
>>> 
>>> * * *
>>> 
>>> 
>>> # Cluster
>>> 
>>> ## Rebalancing
>>> 
>>> With this we will be able to grow clusters one by one instead of
>>> hitting a wall when eventually each shard lives on a single machine.
>>> E.g. when you add a node to the cluster, all other nodes share 1/Nth of
>>> their data with the new node, and everything can keep going. Same for
>>> removing a node and shrinking the cluster.
>>> 
>>> Couchbase has this and it is really nice.
>>> 
>>> 
>>> ## Setup
>>> 
>>> Even without rebalancing, we need a nice Fauxton UI to manage the
>>> cluster, so far we only have a simple setup procedure (which is great
>>> donąt get me wrong), but users will want to do more elaborate cluster
>>> management and we should make that easy with a slick UI.
>>> 
>>> 
>>> ## Cluster-Aware Clients
>>> 
>>> This might end up being not a good idea, but Iąd like some
>>> experimentation here. Say youąd have a CouchDB client that could be
>>> hooked into the cluster topology so itąd know which nodes to query for
>>> which data, then we can save a proxy-hop, and build clients that have
>>> lower-latency access to CouchDB. Again, this is something that Couchbase
>>> does and I think is worth exploring.
>>> 
>>> 
>>> 
>>> * * *
>>> 
>>> 
>>> # Fauxton
>>> 
>>> Fauxton is great, but it could be better too, I think. Iąm mostly
>>> concerned about number of clicks/taps required for more specialised
>>> actions (like setting the group_level of a reduce query, itąs like 15 or
>>> so). More cluster info would also be nice, and maybe a specialised
>>> dashboard for db-per-user setups.
>>> 
>>> 
>>> * * *
>>> 
>>> 
>>> # Releases
>>> 
>>> 
>>> ## Six-Week Release Trains
>>> 
>>> We need to get back to frequent releases and I propose to go back to
>>> our six-week-release train plans from three years ago. Whatever lands
>>> within a release train time frame goes out. The nature of the change
>>> dictates the version number increment as per semver, and we just ship a
>>> new version every six weeks, even if it only includes a single bug fix.
>>> We should automate most of this infrastructure, so actual releases are
>>> cheap. We are reasonably close with this, but we need some more folks to
>>> step up on using and maintaining our CI systems.
>>> 
>>> 
>>> ## One major feature per major version
>>> 
>>> I also propose to keep the scope of future major versions small, so we
>>> donąt have to wait another 3-5 years for 3.0. In particular, I think we
>>> should focus on a single major feature per major version and get that
>>> shipped within 6-12 months tops. If anything needs more time, it needs
>>> to be broken up. Of course we continue to add features and fix things
>>> while this happens, but as a project, there is *one* major feature we
>>> push. For example, for 3.0 I see our push be behind HTTP2 support. There
>>> is a lot of subsequent work required to make that happen, so itąll be a
>>> worthwhile 3.0, but we can ship it in 6-12 months (hopefully).
>>> 
>>> Best case scenario, we have CouchDB 4.0 coming out 12 months from now
>>> with two new major features. That would be amazing.
>>> 
>>> 
>>> * * *
>>> 
>>> 
>>> # Performance
>>> 
>>> ## Perf Team
>>> 
>>> We need a team to comprehensive look at CouchDB performance. There is a
>>> lot of low-hanging fruit like Robert Kowalski showed a while back, we
>>> should get back into this. Iąm mostly inspired by SQLite whoąve done a
>>> release a while back that only focussed on 1-2% performance
>>> improvements, but got like 20-30 of those and made the thing a lot
>>> faster across the board. I canąt remember where I read about this, but
>>> Iąll update this once I find the link.
>>> 
>>> 
>>> ## Benchmark Suite
>>> 
>>> We need a benchmark suite that tests a variety of different work loads.
>>> The goal here is to run different versions of CouchDB against the same
>>> suite on the same hardware, to see where are going. Iąm imagining a
>>> http://arewefastyet.com style dashboard where we can track this, and
>>> even run this on Pull Requests and not allow them if they significantly
>>> impact performance.
>>> 
>>> 
>>> ## Synthetic Load Suite
>>> 
>>> This one is for end users. Iąd like to be able to say: My app produces
>>> mostly 10-20kb-sized docs, but millions of those in a single database,
>>> or across 1000s of databases, with these views etc. and then run this on
>>> target hardware so Iąd know, e.g. how many nodes I need for a cluster
>>> with my estimated workload. I know this can only be done in
>>> approximation, but I think this could make a big difference in CouchDB
>>> adoption and feed back into Perf Team mentioned above.
>>> 
>>> * * *
>>> 
>>> 
>>> # Internals
>>> 
>>> ## Consolidate Repositories
>>> 
>>> With 2.0 we started to experiment with radically small modules for our
>>> components and I think weąve come to the conclusion that some
>>> consolidation is better for us going forward. Obvious candidates for
>>> separate repos are docs, Fauxton etc. but also some of the Erlang
>>> modules that other projects reasonably would use.
>>> 
>>> 
>>> ## Elixir
>>> 
>>> Iąd like it very much if we elevate Elixir as a prime target language
>>> for writing CouchDB internals. I believe this would get us an influx of
>>> new developers that we badly need to get all the things Iąm listing here
>>> done. Somebody might be looking into the technical aspects of this
>>> already, but we need to decide as a project if we are okay with that.
>>> 
>>> 
>>> ## GitHub Issues
>>> 
>>> I hope we can transition to GitHub Issues soon.
>>> 
>>> * * *
>>> 
>>> 
>>> # Builds
>>> 
>>> Iąd like automated builds for source, Docker et.al., rpm, deb, brew,
>>> ports, Mac Binary, etc with proper release channels for people to
>>> subscribe to, all powered by CI for nightly builds, so people can test
>>> in-development versions easily.
>>> 
>>> Iąd also like builds that include popular community plugins like Geo or
>>> Fulltext Search.
>>> 
>>> 
>>> 
>>> * * *
>>> 
>>> 
>>> # Features
>>> 
>>> ## Better Support for db-per-user
>>> 
>>> I donąt know what this will look like, but this is a pattern, and we
>>> need to support it better.
>>> 
>>> One approach could be łvirtual dbs˛ that are backed by a single
>>> database, but thatąs usually at odds with views, so we could make this
>>> an XOR and disable views on these dbs. Since this usually powers
>>> client-heavy apps, querying usually happens there anyway.
>>> 
>>> Another approach would be better / easier cross-db aggregation or
>>> querying. There are a few approaches, but nothing really slick.
>>> 
>>> 
>>> ## Schema Extraction
>>> 
>>> I have half an (old) patch that extracts top level fields from a
>>> document and stores them with a hash in an łattachment˛ to the database
>>> header. So we only end up storing doc values and the schema hash. First
>>> of all this trades storage for CPU time (I havenąt measured anything
>>> yet), but more interestingly, we could use that schema data to do smart
>>> things like auto-generating a validation function / mango expression
>>> based on the data that is already in the database. And other fun things
>>> like easier schema migration operations that are native in CouchDB and
>>> thus a lot faster than external ones. For the curious ones, Iąve got the
>>> idea from V8ąs property access optimisation strategy[10].
>>> 
>>> [10]:
>>> https://github.com/v8/v8/wiki/Design%20Elements#fast-property-access
>>> 
>>> * * *
>>> 
>>> Alright, thatąs it for now. Canąt wait for your feedback!
>>> 
>>> Best
>>> Jan
>>> --
>>> Professional Support for Apache CouchDB:
>>> https://neighbourhood.ie/couchdb-support/
>>> 
>> 

--
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: CouchDB Next

Reply via email to