James, awesome points, thank you for sharing, would love talk details at some point :)
> On 27 Sep 2016, at 20:16, Mutton, James <[email protected]> wrote: > > # Query-server side : Iąve done some performance testing on some very/very > complicated (read łnever should have been written˛) design docs and found > very little performance gain just between c-bindings and Node. The > biggest gain there is in memory management. Iąve watched the SM view > server fragment the memory to be unmallocable while a node view server > happily chugs along to 2-3 times the abuse before crashing, but itąs no > faster. The biggest gain is in staying very far away from the double > serialization/deserialization and the FCGI-like interface and getting it > native. > > # _jsonpointer (we call it Fields/Filters) : We also ended up doing this a > while ago but we broke it into 2 tangential APIąs. First was what we > called the fields api where youąd do /DB/DOC/_fields/FIELDNAME to retrieve > a sub-value from that field located with a JSONPath expression. We also > tangentially added a ?filter=FILTERNAME option to run the output through a > native filter-stage that one could plug Erlang modules into. This lets > you compose some interesting stuff like send a protobuf as binary > converted from a document, store some small object value as base64 and > retrieve it binary without dealing with attachments, or encrypt something > using an asymmetric key and natively/transparently decrypt or sign it etcŠ > > # Wide-clusters (probably a variant on cluster-aware clients) : Scaling > out clustered-couch to łvery-large" ops per second is an interesting > experience. The basic premise that clusters contain all of the database > is nice in that as you create copies of your data you get always-close > behaviors, but thereąs a limit to individual cluster performance. > Depending on the load-balancing involved in reaching your database, how > spread your installation/clients are and much hot-spotting you experience, > you can much more easily end up in cases where reads/writes have to be > shunted to another cluster and because of replication-delay apps start to > see strange behavior. One thing Iąve considered doing recently is using a > transparent routing proxy on-top of couch that takes a provisioning > configuration to locate clusters containing a specific database and use > consistent-hashing to spread the keyspace transactions into predictable > buckets, hinting back to the client what cluster-buckets were used. While > not necessarily a couch-specific feature, and still subject to its own > nuances, itąs a useful lesson in scaling for the inevitable limit with 2.0 > installations. > > </JamesM> > > > > > > On 9/27/16, 7:57, "Adam Kocoloski" <[email protected]> wrote: > >> Wow, thanks for kicking this off Jan. Lots of good ideas in that list. I >> have a few additional ideas: >> >> # Containers and Package Management >> >> Deploying an Erlang-based system can still be an unfriendly exercise. >> Rather than redouble our efforts to play nice with all variants of >> distro-specific package managers out there, letąs do the following: a) >> deliver an official Snap package (thanks Michael!) for those who still >> want Linux packages, and b) plug our Docker image into the popular >> container orchestration frameworks. I want to see one-touch cluster(able) >> deployments in Kubernetes, DC/OS Universe, and Docker Compose / DAB. >> Promote these options as preferred ways to get up and running with >> CouchDB. >> >> # Tombstone Curation >> >> You touched on this with some of your thoughts on Replication, but Iąd >> like to investigate ways to excise tombstones safely from existing >> databases. We know that documents with wide revision trees become very >> unwieldy, and that the best practices around conflict management do >> nothing to help address this. Rather than ask the user to go in and >> manually purge records, can we compute when itąs safe to automatically >> prune a deleted edit branch? >> >> # Database Archival >> >> Clustering is great and all, but sometimes one just wants to get old data >> out of the database and into some cheaper storage. Many IoT historian use >> cases fall into this bucket. This work could take a lot of different >> forms, from simple whole-database archival to more subtle policy-based >> archiving within a database. >> >> # Integrations - Object Storage, Kafka, Spark >> >> Most of the pluggable storage engines you mentioned will not be happy >> about storing large attachments. We started a bit of work to optionally >> offload those attachments into an object store like S3 or Swift and keep >> just the metadata in CouchDB; Iąd like to see that through. Iąd also like >> to establish a stronger linkage with a few of our Apache brethren. >> Enabling a _changes feed to be published in Kafka (and a Kafka topic to >> be loaded into a database) will help Couch play in more sophisticated >> data processing pipelines. On the Spark side weąve already written code >> that can be used to expose CouchDB as an external datasource, but there >> are still some significant optimizations that we can apply (in the vein >> of the cluster-aware clients mentioned below). >> >> Adam >> >>> On Sep 27, 2016, at 5:56 AM, Jan Lehnardt <[email protected]> wrote: >>> >>> Hi all, >>> >>> apologies in advance, this is going to be a long email. >>> >>> >>> Iąve been holding this back intentionally in order to be able to focus >>> on shipping 2.0, but now that thatąs out, I feel we should talk about >>> whatąs next. >>> >>> This email is separated into areas of work that I think CouchDB could >>> improve on, some with very concrete plans, some with rather vague ideas. >>> Iąve been collecting these over the past year or >>> <strike>two</strike>five, so itąs fairly wide, but Iąm sure Iąm missing >>> things that other people find important, so please add to this list. >>> >>> After the initial discussion here, Iąll move all of the individual >>> issues to JIRA, so we can go down our usual process. >>> >>> This is basically my wish list, and Iąd like this to become everyoneąs >>> wish list, so please add what Iąve been missing. :) ‹ Note, this isnąt a >>> free-for-all, only suggest things that you are prepared to see through >>> being shipped, from design, implementation to docs. >>> >>> I donąt have a specific order for these in mind, although I have a >>> rough idea of what we should be doing first. Putting all of this on a >>> roadmap is going to be a fun future exercise for us, though :) >>> >>> One last note: this doesnąt include anything on documentation or >>> testing. I fully expect to step our game from here on out. This list is >>> for the technical aspects of the project. >>> >>> * * * >>> >>> These are the areas of work Iąve roughly come up with that my >>> suggestions fit into: >>> >>> - API >>> - Storage >>> - Query >>> - Replication >>> - Cluster >>> - Fauxton >>> - Releases >>> - Performance >>> - Internals >>> - Builds >>> - Features >>> >>> (Iąm not claiming these are any good, but itąs what Iąve got) >>> >>> >>> Letąs go. >>> >>> >>> * * * >>> >>> # API >>> >>> ## HTTP2 >>> >>> I think this is an obvious first next step. Our HTTP Layer needs work, >>> our existing HTTP server library is not getting HTTP2 support, itąs time >>> to attack this head-first. Iąm imagining a Cowboy[1]-based HTTP layer >>> that calls into a unified internals layer and everything will be >>> rose-golden. HTTP2 support for Cowboy is still in progress. Maybe we can >>> help them along, or we focus on the internals refactor first and drop >>> Cowboy in later (not sure how feasible this approach is, but weąll >>> figure this out. >>> >>> In my head, we focus on this and call the result 3.0 in 6-12 months. >>> That doesnąt mean we *only* do this, but this will be the focus (more on >>> this later). >>> >>> There are a few fun considerations, mainly of the łavoid Python >>> 2/3-chasm˛-type. Do we re-implement the 2.0 API with all its >>> idiosyncrasies, or do we take the opportunity to clean things up while >>> we are at it? If yes, how and how long do we support the then old API? >>> Do we manage this via different ports? If yes, how can this me made to >>> work for hosting services like Cloudant? Etc. etc. >>> >>> [1] https://github.com/ninenines/cowboy >>> >>> >>> ## Sub-Document Operations >>> >>> Currently a doc update needs the whole doc body sent to the server. >>> There are some obvious performance improvements possible. For the >>> longest time, I wanted to see if we can model sub-document operations >>> via JSON Pointers[2]. These would roughly allow pointing to a JSON value >>> via a URL. >>> >>> For example in this doc: >>> >>> { >>> "_id": "123abc", >>> "_rev": "zyx987", >>> "contact": { >>> "name": "", >>> "address": { >>> "street": "Long Street", >>> "nr": 123 >>> "zip": "12345" >>> } >>> } >>> >>> An update to the zip code could look like this: >>> >>> curl -X POST >>> $SERVER/db/123abc/_jsonpointer/contact/address/zip?rev=zyx987 -d '54321' >>> >>> GET/DELETE accordingly. We could shortcut the `_jsonpointer` to just >>> `_` if we like the short magic. >>> >>> JSONPointer can deal with nested objects and lists and works fairly >>> well for this type of stuff, and it is rather simple to implement (even >>> I could do it: >>> https://github.com/janl/erl-jsonpointer/blob/master/src/jsonpointer.erl >>> ‹ This idea is literally 5 years old, it looks like, no need to use my >>> code if there is anything better). >>> >>> This is just a raw idea, and Iąm happy to solve this any other way, if >>> somebody has a good approach. >>> >>> [2] https://tools.ietf.org/html/rfc6901 >>> >>> >>> ## HTTP PATCH / JSON Diff >>> >>> Another stab at a similar problem are HTTP PATCH with JSON Diff, but >>> with the inherent problems of JSON normalisation, Iąm leaning towards >>> the JSONPointer variant as simpler, but Iąd be open for this as well, if >>> someone comes up with a good approach. >>> >>> >>> ## GraphQL[3] >>> >>> Itąs rather new, but getting good traction[4]. This would be a nice >>> addition to our API. Somebody might already be hacking on this ;) >>> >>> [3]: http://graphql.org >>> [4]: http://githubengineering.com/the-github-graphql-api/ >>> >>> >>> ## Mango for Document Validation >>> >>> The only place where we absolutely require writing JS is >>> validate_doc_update functions. Some security behaviour can only be >>> enforced there. With their inherent performance problems, Iąd like to >>> get doc validations out of the path of the query server and would love >>> to find a way to validate document updates through Mango. >>> >>> >>> ## Redesign Security System >>> >>> Our security system is slowly grown and not coherently designed. We >>> should start over. I have many ideas and opinions, but they are out of >>> scope for this. I think everybody here agrees that we can do better. >>> This *very likely* will *not* include per-document ACLs as per the often >>> stated issues with that approach in our data model. >>> >>> * * * >>> >>> >>> # Replication >>> >>> This is our flagship feature of course, and there are a few things we >>> can do better. >>> >>> >>> ## Mobile-optimised extension or new version of the protocol >>> >>> The original protocol design didnąt take mobile devices into account >>> and through PouchDB et.al. we are now learning that there are number of >>> downsides to our protocol. Weąve helped a lot with introducing >>> _bulk_get/_revs, but thatąs more a bandaid than a considered strategy ;) >>> >>> That new version could also be HTTP2-only, to take advantage of the new >>> connection semantics there. >>> >>> >>> ## Easy way to skip deletes on sync >>> >>> This one is self-explanatory, mobile clients usually donąt need to sync >>> deletes from a year ago first. Mango filters might already get us there, >>> maybe we can do better. >>> >>> >>> ## Sync a rolling subset >>> >>> Say you always want to keep the last 90 days of email on a mobile >>> device with optionally back-loading older documents on user-request. It >>> is something I could see getting a lot of traction. >>> >>> Today, this can be built on 1.x with clever use of _purge, but thatąs >>> hardly a good experience. I donąt know if it can be done in a cluster. >>> >>> >>> ## Selective Sync >>> >>> There might be other criteria than łlast 90 days˛, so the more general >>> solution to this problem class would be arbitrary (e.g. client-directed) >>> selective sync, but this might be really hard as opposed to just very >>> hard of the łlast 90 days˛ one, so happy to punt on this first. But >>> filters are generally not the answer, especially with large data sets. >>> Maybe proper sync from views _changes is the answer. >>> >>> >>> ## A _db_updates powered _replicator DB >>> >>> Running thousands+ of replications on a server is not really resource >>> friendly today, we should teach the replicator to only run replication >>> on active databases via _db_updates. Somebody might already be looking >>> into this one. >>> >>> * * * >>> >>> >>> # Storage >>> >>> >>> ## Pluggable Storage Engines >>> >>> Paul Davis already showed some work on allowing multiple different >>> storage backends. Iąd like to see this land. >>> >>> ## Different Storage Backends >>> >>> These donąt all have to be supported by the main project, but Iąd >>> really like to see some experimentation with different backends like >>> LevelDB[5]/RocksDB[6], InnoDB[7], SQLite[8] a native-erlang one that is >>> optimised for space usage and not performance (I donąt want to budge on >>> safety). Similarly, itąd be fun to see if there is a compression format >>> that we can use as a storage backend directly, so we get full-DB >>> compression as opposed to just per-doc compression. >>> >>> [5]: http://leveldb.org >>> [6]: http://rocksdb.org >>> [7]: https://en.wikipedia.org/wiki/InnoDB >>> [8]: https://www.sqlite.org >>> >>> * * * >>> >>> >>> # Query >>> >>> ## Teach Mango JOINs and result sorting >>> >>> Itąs the natural path for query languages. We should make these happen. >>> Once we have the basics, we might even be able to find a way to compile >>> basic SQL into Mango, itąs going to be glorious :) >>> >>> >>> ## łNo-JavaScript˛-mode >>> >>> Iąve hinted at this above, but Iąd really like a way for users to use >>> CouchDB productively without having to write a line of JavaScript. My >>> main motivation is the poor performance characteristics of the Query >>> Server (hello CGI[9]?). But even with one that is improved, it will >>> always faster to do any, say filtering or validation operations in >>> native Erlang. I donąt know if we can expand Mango to cover all this, >>> and Iąm not really concerned about the specifics, as long as we get >>> there. >>> >>> Of course, for pro-users, the JS-variant will still be around. >>> >>> [9]: https://en.wikipedia.org/wiki/Common_Gateway_Interface >>> >>> >>> ## Query Server V2 >>> >>> We need to revamp the Query Server. It is hardcoded to an out-of-date >>> version of SpiderMonkey and we are stuck with C-bindings that barely >>> anyone dares to look at, let alone iterate on. >>> >>> I believe the way forward is re-vamping the query server protocol to >>> use streaming IO instead of blocking batches like we do now, and use >>> JS-native implementation of the JS-side instead of C-bindings. >>> >>> Iąm partial to doing this straight in Node, because there is a ton of >>> support for things we need already, and I believe weąve solved the >>> isolation issues required for secure MapReduce, but Iąm happy to use any >>> other thing as well, if it helps. >>> >>> Other benefits would be support for emerging JS features that devs will >>> want to use. >>> >>> And we can have two modes: standalone QS like now, and embedded QS >>> where, say, V8 is compiled into the Erlang VM. Not everybody will want >>> to run this, but itąll be neat for those who do. >>> >>> >>> * * * >>> >>> >>> # Cluster >>> >>> ## Rebalancing >>> >>> With this we will be able to grow clusters one by one instead of >>> hitting a wall when eventually each shard lives on a single machine. >>> E.g. when you add a node to the cluster, all other nodes share 1/Nth of >>> their data with the new node, and everything can keep going. Same for >>> removing a node and shrinking the cluster. >>> >>> Couchbase has this and it is really nice. >>> >>> >>> ## Setup >>> >>> Even without rebalancing, we need a nice Fauxton UI to manage the >>> cluster, so far we only have a simple setup procedure (which is great >>> donąt get me wrong), but users will want to do more elaborate cluster >>> management and we should make that easy with a slick UI. >>> >>> >>> ## Cluster-Aware Clients >>> >>> This might end up being not a good idea, but Iąd like some >>> experimentation here. Say youąd have a CouchDB client that could be >>> hooked into the cluster topology so itąd know which nodes to query for >>> which data, then we can save a proxy-hop, and build clients that have >>> lower-latency access to CouchDB. Again, this is something that Couchbase >>> does and I think is worth exploring. >>> >>> >>> >>> * * * >>> >>> >>> # Fauxton >>> >>> Fauxton is great, but it could be better too, I think. Iąm mostly >>> concerned about number of clicks/taps required for more specialised >>> actions (like setting the group_level of a reduce query, itąs like 15 or >>> so). More cluster info would also be nice, and maybe a specialised >>> dashboard for db-per-user setups. >>> >>> >>> * * * >>> >>> >>> # Releases >>> >>> >>> ## Six-Week Release Trains >>> >>> We need to get back to frequent releases and I propose to go back to >>> our six-week-release train plans from three years ago. Whatever lands >>> within a release train time frame goes out. The nature of the change >>> dictates the version number increment as per semver, and we just ship a >>> new version every six weeks, even if it only includes a single bug fix. >>> We should automate most of this infrastructure, so actual releases are >>> cheap. We are reasonably close with this, but we need some more folks to >>> step up on using and maintaining our CI systems. >>> >>> >>> ## One major feature per major version >>> >>> I also propose to keep the scope of future major versions small, so we >>> donąt have to wait another 3-5 years for 3.0. In particular, I think we >>> should focus on a single major feature per major version and get that >>> shipped within 6-12 months tops. If anything needs more time, it needs >>> to be broken up. Of course we continue to add features and fix things >>> while this happens, but as a project, there is *one* major feature we >>> push. For example, for 3.0 I see our push be behind HTTP2 support. There >>> is a lot of subsequent work required to make that happen, so itąll be a >>> worthwhile 3.0, but we can ship it in 6-12 months (hopefully). >>> >>> Best case scenario, we have CouchDB 4.0 coming out 12 months from now >>> with two new major features. That would be amazing. >>> >>> >>> * * * >>> >>> >>> # Performance >>> >>> ## Perf Team >>> >>> We need a team to comprehensive look at CouchDB performance. There is a >>> lot of low-hanging fruit like Robert Kowalski showed a while back, we >>> should get back into this. Iąm mostly inspired by SQLite whoąve done a >>> release a while back that only focussed on 1-2% performance >>> improvements, but got like 20-30 of those and made the thing a lot >>> faster across the board. I canąt remember where I read about this, but >>> Iąll update this once I find the link. >>> >>> >>> ## Benchmark Suite >>> >>> We need a benchmark suite that tests a variety of different work loads. >>> The goal here is to run different versions of CouchDB against the same >>> suite on the same hardware, to see where are going. Iąm imagining a >>> http://arewefastyet.com style dashboard where we can track this, and >>> even run this on Pull Requests and not allow them if they significantly >>> impact performance. >>> >>> >>> ## Synthetic Load Suite >>> >>> This one is for end users. Iąd like to be able to say: My app produces >>> mostly 10-20kb-sized docs, but millions of those in a single database, >>> or across 1000s of databases, with these views etc. and then run this on >>> target hardware so Iąd know, e.g. how many nodes I need for a cluster >>> with my estimated workload. I know this can only be done in >>> approximation, but I think this could make a big difference in CouchDB >>> adoption and feed back into Perf Team mentioned above. >>> >>> * * * >>> >>> >>> # Internals >>> >>> ## Consolidate Repositories >>> >>> With 2.0 we started to experiment with radically small modules for our >>> components and I think weąve come to the conclusion that some >>> consolidation is better for us going forward. Obvious candidates for >>> separate repos are docs, Fauxton etc. but also some of the Erlang >>> modules that other projects reasonably would use. >>> >>> >>> ## Elixir >>> >>> Iąd like it very much if we elevate Elixir as a prime target language >>> for writing CouchDB internals. I believe this would get us an influx of >>> new developers that we badly need to get all the things Iąm listing here >>> done. Somebody might be looking into the technical aspects of this >>> already, but we need to decide as a project if we are okay with that. >>> >>> >>> ## GitHub Issues >>> >>> I hope we can transition to GitHub Issues soon. >>> >>> * * * >>> >>> >>> # Builds >>> >>> Iąd like automated builds for source, Docker et.al., rpm, deb, brew, >>> ports, Mac Binary, etc with proper release channels for people to >>> subscribe to, all powered by CI for nightly builds, so people can test >>> in-development versions easily. >>> >>> Iąd also like builds that include popular community plugins like Geo or >>> Fulltext Search. >>> >>> >>> >>> * * * >>> >>> >>> # Features >>> >>> ## Better Support for db-per-user >>> >>> I donąt know what this will look like, but this is a pattern, and we >>> need to support it better. >>> >>> One approach could be łvirtual dbs˛ that are backed by a single >>> database, but thatąs usually at odds with views, so we could make this >>> an XOR and disable views on these dbs. Since this usually powers >>> client-heavy apps, querying usually happens there anyway. >>> >>> Another approach would be better / easier cross-db aggregation or >>> querying. There are a few approaches, but nothing really slick. >>> >>> >>> ## Schema Extraction >>> >>> I have half an (old) patch that extracts top level fields from a >>> document and stores them with a hash in an łattachment˛ to the database >>> header. So we only end up storing doc values and the schema hash. First >>> of all this trades storage for CPU time (I havenąt measured anything >>> yet), but more interestingly, we could use that schema data to do smart >>> things like auto-generating a validation function / mango expression >>> based on the data that is already in the database. And other fun things >>> like easier schema migration operations that are native in CouchDB and >>> thus a lot faster than external ones. For the curious ones, Iąve got the >>> idea from V8ąs property access optimisation strategy[10]. >>> >>> [10]: >>> https://github.com/v8/v8/wiki/Design%20Elements#fast-property-access >>> >>> * * * >>> >>> Alright, thatąs it for now. Canąt wait for your feedback! >>> >>> Best >>> Jan >>> -- >>> Professional Support for Apache CouchDB: >>> https://neighbourhood.ie/couchdb-support/ >>> >> -- Professional Support for Apache CouchDB: https://neighbourhood.ie/couchdb-support/
signature.asc
Description: Message signed with OpenPGP using GPGMail
