Hey Jan, The SQLite perf improvement article you're looking for is this one <https://web.archive.org/web/20160305085922/http://permalink.gmane.org/gmane.comp.db.sqlite.general/90549>. Unfortunately the original site seems to be down, but thank glob for archive.org. :)
I agree that perf improvements in replication would be a nice place to look into. This is a major differentiator for e.g. Firebase, which does everything over a single WebSocket connection. When PouchDB users look at the Network tab their first reaction is often horror at the sheer number of HTTP requests. _bulk_get has helped quite a bit (or has it? haven't seen many benchmarks on this), but HTTP/2 or WebSockets would be a big leap forward. It's also unfortunate that we're still using base64+JSON for attachments, because we're unable to configure CouchDB multipart/mixed to give us unzipped attachments. (There's no way to unzip individual multipart components in the browser, modulo some awful thing like zip.js which would add gobs of CPU time.) Cheers, Nolan On Tue, Sep 27, 2016 at 12:24 PM, Jan Lehnardt <[email protected]> wrote: > James, awesome points, thank you for sharing, would love talk details at > some > point :) > > > On 27 Sep 2016, at 20:16, Mutton, James <[email protected]> wrote: > > > > # Query-server side : Iąve done some performance testing on some > very/very > > complicated (read łnever should have been written˛) design docs and found > > very little performance gain just between c-bindings and Node. The > > biggest gain there is in memory management. Iąve watched the SM view > > server fragment the memory to be unmallocable while a node view server > > happily chugs along to 2-3 times the abuse before crashing, but itąs no > > faster. The biggest gain is in staying very far away from the double > > serialization/deserialization and the FCGI-like interface and getting it > > native. > > > > # _jsonpointer (we call it Fields/Filters) : We also ended up doing this > a > > while ago but we broke it into 2 tangential APIąs. First was what we > > called the fields api where youąd do /DB/DOC/_fields/FIELDNAME to > retrieve > > a sub-value from that field located with a JSONPath expression. We also > > tangentially added a ?filter=FILTERNAME option to run the output through > a > > native filter-stage that one could plug Erlang modules into. This lets > > you compose some interesting stuff like send a protobuf as binary > > converted from a document, store some small object value as base64 and > > retrieve it binary without dealing with attachments, or encrypt something > > using an asymmetric key and natively/transparently decrypt or sign it > etcŠ > > > > # Wide-clusters (probably a variant on cluster-aware clients) : Scaling > > out clustered-couch to łvery-large" ops per second is an interesting > > experience. The basic premise that clusters contain all of the database > > is nice in that as you create copies of your data you get always-close > > behaviors, but thereąs a limit to individual cluster performance. > > Depending on the load-balancing involved in reaching your database, how > > spread your installation/clients are and much hot-spotting you > experience, > > you can much more easily end up in cases where reads/writes have to be > > shunted to another cluster and because of replication-delay apps start to > > see strange behavior. One thing Iąve considered doing recently is using > a > > transparent routing proxy on-top of couch that takes a provisioning > > configuration to locate clusters containing a specific database and use > > consistent-hashing to spread the keyspace transactions into predictable > > buckets, hinting back to the client what cluster-buckets were used. > While > > not necessarily a couch-specific feature, and still subject to its own > > nuances, itąs a useful lesson in scaling for the inevitable limit with > 2.0 > > installations. > > > > </JamesM> > > > > > > > > > > > > On 9/27/16, 7:57, "Adam Kocoloski" <[email protected]> wrote: > > > >> Wow, thanks for kicking this off Jan. Lots of good ideas in that list. I > >> have a few additional ideas: > >> > >> # Containers and Package Management > >> > >> Deploying an Erlang-based system can still be an unfriendly exercise. > >> Rather than redouble our efforts to play nice with all variants of > >> distro-specific package managers out there, letąs do the following: a) > >> deliver an official Snap package (thanks Michael!) for those who still > >> want Linux packages, and b) plug our Docker image into the popular > >> container orchestration frameworks. I want to see one-touch > cluster(able) > >> deployments in Kubernetes, DC/OS Universe, and Docker Compose / DAB. > >> Promote these options as preferred ways to get up and running with > >> CouchDB. > >> > >> # Tombstone Curation > >> > >> You touched on this with some of your thoughts on Replication, but Iąd > >> like to investigate ways to excise tombstones safely from existing > >> databases. We know that documents with wide revision trees become very > >> unwieldy, and that the best practices around conflict management do > >> nothing to help address this. Rather than ask the user to go in and > >> manually purge records, can we compute when itąs safe to automatically > >> prune a deleted edit branch? > >> > >> # Database Archival > >> > >> Clustering is great and all, but sometimes one just wants to get old > data > >> out of the database and into some cheaper storage. Many IoT historian > use > >> cases fall into this bucket. This work could take a lot of different > >> forms, from simple whole-database archival to more subtle policy-based > >> archiving within a database. > >> > >> # Integrations - Object Storage, Kafka, Spark > >> > >> Most of the pluggable storage engines you mentioned will not be happy > >> about storing large attachments. We started a bit of work to optionally > >> offload those attachments into an object store like S3 or Swift and keep > >> just the metadata in CouchDB; Iąd like to see that through. Iąd also > like > >> to establish a stronger linkage with a few of our Apache brethren. > >> Enabling a _changes feed to be published in Kafka (and a Kafka topic to > >> be loaded into a database) will help Couch play in more sophisticated > >> data processing pipelines. On the Spark side weąve already written code > >> that can be used to expose CouchDB as an external datasource, but there > >> are still some significant optimizations that we can apply (in the vein > >> of the cluster-aware clients mentioned below). > >> > >> Adam > >> > >>> On Sep 27, 2016, at 5:56 AM, Jan Lehnardt <[email protected]> wrote: > >>> > >>> Hi all, > >>> > >>> apologies in advance, this is going to be a long email. > >>> > >>> > >>> Iąve been holding this back intentionally in order to be able to focus > >>> on shipping 2.0, but now that thatąs out, I feel we should talk about > >>> whatąs next. > >>> > >>> This email is separated into areas of work that I think CouchDB could > >>> improve on, some with very concrete plans, some with rather vague > ideas. > >>> Iąve been collecting these over the past year or > >>> <strike>two</strike>five, so itąs fairly wide, but Iąm sure Iąm missing > >>> things that other people find important, so please add to this list. > >>> > >>> After the initial discussion here, Iąll move all of the individual > >>> issues to JIRA, so we can go down our usual process. > >>> > >>> This is basically my wish list, and Iąd like this to become everyoneąs > >>> wish list, so please add what Iąve been missing. :) ‹ Note, this isnąt > a > >>> free-for-all, only suggest things that you are prepared to see through > >>> being shipped, from design, implementation to docs. > >>> > >>> I donąt have a specific order for these in mind, although I have a > >>> rough idea of what we should be doing first. Putting all of this on a > >>> roadmap is going to be a fun future exercise for us, though :) > >>> > >>> One last note: this doesnąt include anything on documentation or > >>> testing. I fully expect to step our game from here on out. This list is > >>> for the technical aspects of the project. > >>> > >>> * * * > >>> > >>> These are the areas of work Iąve roughly come up with that my > >>> suggestions fit into: > >>> > >>> - API > >>> - Storage > >>> - Query > >>> - Replication > >>> - Cluster > >>> - Fauxton > >>> - Releases > >>> - Performance > >>> - Internals > >>> - Builds > >>> - Features > >>> > >>> (Iąm not claiming these are any good, but itąs what Iąve got) > >>> > >>> > >>> Letąs go. > >>> > >>> > >>> * * * > >>> > >>> # API > >>> > >>> ## HTTP2 > >>> > >>> I think this is an obvious first next step. Our HTTP Layer needs work, > >>> our existing HTTP server library is not getting HTTP2 support, itąs > time > >>> to attack this head-first. Iąm imagining a Cowboy[1]-based HTTP layer > >>> that calls into a unified internals layer and everything will be > >>> rose-golden. HTTP2 support for Cowboy is still in progress. Maybe we > can > >>> help them along, or we focus on the internals refactor first and drop > >>> Cowboy in later (not sure how feasible this approach is, but weąll > >>> figure this out. > >>> > >>> In my head, we focus on this and call the result 3.0 in 6-12 months. > >>> That doesnąt mean we *only* do this, but this will be the focus (more > on > >>> this later). > >>> > >>> There are a few fun considerations, mainly of the łavoid Python > >>> 2/3-chasm˛-type. Do we re-implement the 2.0 API with all its > >>> idiosyncrasies, or do we take the opportunity to clean things up while > >>> we are at it? If yes, how and how long do we support the then old API? > >>> Do we manage this via different ports? If yes, how can this me made to > >>> work for hosting services like Cloudant? Etc. etc. > >>> > >>> [1] https://github.com/ninenines/cowboy > >>> > >>> > >>> ## Sub-Document Operations > >>> > >>> Currently a doc update needs the whole doc body sent to the server. > >>> There are some obvious performance improvements possible. For the > >>> longest time, I wanted to see if we can model sub-document operations > >>> via JSON Pointers[2]. These would roughly allow pointing to a JSON > value > >>> via a URL. > >>> > >>> For example in this doc: > >>> > >>> { > >>> "_id": "123abc", > >>> "_rev": "zyx987", > >>> "contact": { > >>> "name": "", > >>> "address": { > >>> "street": "Long Street", > >>> "nr": 123 > >>> "zip": "12345" > >>> } > >>> } > >>> > >>> An update to the zip code could look like this: > >>> > >>> curl -X POST > >>> $SERVER/db/123abc/_jsonpointer/contact/address/zip?rev=zyx987 -d > '54321' > >>> > >>> GET/DELETE accordingly. We could shortcut the `_jsonpointer` to just > >>> `_` if we like the short magic. > >>> > >>> JSONPointer can deal with nested objects and lists and works fairly > >>> well for this type of stuff, and it is rather simple to implement (even > >>> I could do it: > >>> https://github.com/janl/erl-jsonpointer/blob/master/src/ > jsonpointer.erl > >>> ‹ This idea is literally 5 years old, it looks like, no need to use my > >>> code if there is anything better). > >>> > >>> This is just a raw idea, and Iąm happy to solve this any other way, if > >>> somebody has a good approach. > >>> > >>> [2] https://tools.ietf.org/html/rfc6901 > >>> > >>> > >>> ## HTTP PATCH / JSON Diff > >>> > >>> Another stab at a similar problem are HTTP PATCH with JSON Diff, but > >>> with the inherent problems of JSON normalisation, Iąm leaning towards > >>> the JSONPointer variant as simpler, but Iąd be open for this as well, > if > >>> someone comes up with a good approach. > >>> > >>> > >>> ## GraphQL[3] > >>> > >>> Itąs rather new, but getting good traction[4]. This would be a nice > >>> addition to our API. Somebody might already be hacking on this ;) > >>> > >>> [3]: http://graphql.org > >>> [4]: http://githubengineering.com/the-github-graphql-api/ > >>> > >>> > >>> ## Mango for Document Validation > >>> > >>> The only place where we absolutely require writing JS is > >>> validate_doc_update functions. Some security behaviour can only be > >>> enforced there. With their inherent performance problems, Iąd like to > >>> get doc validations out of the path of the query server and would love > >>> to find a way to validate document updates through Mango. > >>> > >>> > >>> ## Redesign Security System > >>> > >>> Our security system is slowly grown and not coherently designed. We > >>> should start over. I have many ideas and opinions, but they are out of > >>> scope for this. I think everybody here agrees that we can do better. > >>> This *very likely* will *not* include per-document ACLs as per the > often > >>> stated issues with that approach in our data model. > >>> > >>> * * * > >>> > >>> > >>> # Replication > >>> > >>> This is our flagship feature of course, and there are a few things we > >>> can do better. > >>> > >>> > >>> ## Mobile-optimised extension or new version of the protocol > >>> > >>> The original protocol design didnąt take mobile devices into account > >>> and through PouchDB et.al. we are now learning that there are number > of > >>> downsides to our protocol. Weąve helped a lot with introducing > >>> _bulk_get/_revs, but thatąs more a bandaid than a considered strategy > ;) > >>> > >>> That new version could also be HTTP2-only, to take advantage of the new > >>> connection semantics there. > >>> > >>> > >>> ## Easy way to skip deletes on sync > >>> > >>> This one is self-explanatory, mobile clients usually donąt need to sync > >>> deletes from a year ago first. Mango filters might already get us > there, > >>> maybe we can do better. > >>> > >>> > >>> ## Sync a rolling subset > >>> > >>> Say you always want to keep the last 90 days of email on a mobile > >>> device with optionally back-loading older documents on user-request. It > >>> is something I could see getting a lot of traction. > >>> > >>> Today, this can be built on 1.x with clever use of _purge, but thatąs > >>> hardly a good experience. I donąt know if it can be done in a cluster. > >>> > >>> > >>> ## Selective Sync > >>> > >>> There might be other criteria than łlast 90 days˛, so the more general > >>> solution to this problem class would be arbitrary (e.g. > client-directed) > >>> selective sync, but this might be really hard as opposed to just very > >>> hard of the łlast 90 days˛ one, so happy to punt on this first. But > >>> filters are generally not the answer, especially with large data sets. > >>> Maybe proper sync from views _changes is the answer. > >>> > >>> > >>> ## A _db_updates powered _replicator DB > >>> > >>> Running thousands+ of replications on a server is not really resource > >>> friendly today, we should teach the replicator to only run replication > >>> on active databases via _db_updates. Somebody might already be looking > >>> into this one. > >>> > >>> * * * > >>> > >>> > >>> # Storage > >>> > >>> > >>> ## Pluggable Storage Engines > >>> > >>> Paul Davis already showed some work on allowing multiple different > >>> storage backends. Iąd like to see this land. > >>> > >>> ## Different Storage Backends > >>> > >>> These donąt all have to be supported by the main project, but Iąd > >>> really like to see some experimentation with different backends like > >>> LevelDB[5]/RocksDB[6], InnoDB[7], SQLite[8] a native-erlang one that is > >>> optimised for space usage and not performance (I donąt want to budge on > >>> safety). Similarly, itąd be fun to see if there is a compression format > >>> that we can use as a storage backend directly, so we get full-DB > >>> compression as opposed to just per-doc compression. > >>> > >>> [5]: http://leveldb.org > >>> [6]: http://rocksdb.org > >>> [7]: https://en.wikipedia.org/wiki/InnoDB > >>> [8]: https://www.sqlite.org > >>> > >>> * * * > >>> > >>> > >>> # Query > >>> > >>> ## Teach Mango JOINs and result sorting > >>> > >>> Itąs the natural path for query languages. We should make these happen. > >>> Once we have the basics, we might even be able to find a way to compile > >>> basic SQL into Mango, itąs going to be glorious :) > >>> > >>> > >>> ## łNo-JavaScript˛-mode > >>> > >>> Iąve hinted at this above, but Iąd really like a way for users to use > >>> CouchDB productively without having to write a line of JavaScript. My > >>> main motivation is the poor performance characteristics of the Query > >>> Server (hello CGI[9]?). But even with one that is improved, it will > >>> always faster to do any, say filtering or validation operations in > >>> native Erlang. I donąt know if we can expand Mango to cover all this, > >>> and Iąm not really concerned about the specifics, as long as we get > >>> there. > >>> > >>> Of course, for pro-users, the JS-variant will still be around. > >>> > >>> [9]: https://en.wikipedia.org/wiki/Common_Gateway_Interface > >>> > >>> > >>> ## Query Server V2 > >>> > >>> We need to revamp the Query Server. It is hardcoded to an out-of-date > >>> version of SpiderMonkey and we are stuck with C-bindings that barely > >>> anyone dares to look at, let alone iterate on. > >>> > >>> I believe the way forward is re-vamping the query server protocol to > >>> use streaming IO instead of blocking batches like we do now, and use > >>> JS-native implementation of the JS-side instead of C-bindings. > >>> > >>> Iąm partial to doing this straight in Node, because there is a ton of > >>> support for things we need already, and I believe weąve solved the > >>> isolation issues required for secure MapReduce, but Iąm happy to use > any > >>> other thing as well, if it helps. > >>> > >>> Other benefits would be support for emerging JS features that devs will > >>> want to use. > >>> > >>> And we can have two modes: standalone QS like now, and embedded QS > >>> where, say, V8 is compiled into the Erlang VM. Not everybody will want > >>> to run this, but itąll be neat for those who do. > >>> > >>> > >>> * * * > >>> > >>> > >>> # Cluster > >>> > >>> ## Rebalancing > >>> > >>> With this we will be able to grow clusters one by one instead of > >>> hitting a wall when eventually each shard lives on a single machine. > >>> E.g. when you add a node to the cluster, all other nodes share 1/Nth of > >>> their data with the new node, and everything can keep going. Same for > >>> removing a node and shrinking the cluster. > >>> > >>> Couchbase has this and it is really nice. > >>> > >>> > >>> ## Setup > >>> > >>> Even without rebalancing, we need a nice Fauxton UI to manage the > >>> cluster, so far we only have a simple setup procedure (which is great > >>> donąt get me wrong), but users will want to do more elaborate cluster > >>> management and we should make that easy with a slick UI. > >>> > >>> > >>> ## Cluster-Aware Clients > >>> > >>> This might end up being not a good idea, but Iąd like some > >>> experimentation here. Say youąd have a CouchDB client that could be > >>> hooked into the cluster topology so itąd know which nodes to query for > >>> which data, then we can save a proxy-hop, and build clients that have > >>> lower-latency access to CouchDB. Again, this is something that > Couchbase > >>> does and I think is worth exploring. > >>> > >>> > >>> > >>> * * * > >>> > >>> > >>> # Fauxton > >>> > >>> Fauxton is great, but it could be better too, I think. Iąm mostly > >>> concerned about number of clicks/taps required for more specialised > >>> actions (like setting the group_level of a reduce query, itąs like 15 > or > >>> so). More cluster info would also be nice, and maybe a specialised > >>> dashboard for db-per-user setups. > >>> > >>> > >>> * * * > >>> > >>> > >>> # Releases > >>> > >>> > >>> ## Six-Week Release Trains > >>> > >>> We need to get back to frequent releases and I propose to go back to > >>> our six-week-release train plans from three years ago. Whatever lands > >>> within a release train time frame goes out. The nature of the change > >>> dictates the version number increment as per semver, and we just ship a > >>> new version every six weeks, even if it only includes a single bug fix. > >>> We should automate most of this infrastructure, so actual releases are > >>> cheap. We are reasonably close with this, but we need some more folks > to > >>> step up on using and maintaining our CI systems. > >>> > >>> > >>> ## One major feature per major version > >>> > >>> I also propose to keep the scope of future major versions small, so we > >>> donąt have to wait another 3-5 years for 3.0. In particular, I think we > >>> should focus on a single major feature per major version and get that > >>> shipped within 6-12 months tops. If anything needs more time, it needs > >>> to be broken up. Of course we continue to add features and fix things > >>> while this happens, but as a project, there is *one* major feature we > >>> push. For example, for 3.0 I see our push be behind HTTP2 support. > There > >>> is a lot of subsequent work required to make that happen, so itąll be a > >>> worthwhile 3.0, but we can ship it in 6-12 months (hopefully). > >>> > >>> Best case scenario, we have CouchDB 4.0 coming out 12 months from now > >>> with two new major features. That would be amazing. > >>> > >>> > >>> * * * > >>> > >>> > >>> # Performance > >>> > >>> ## Perf Team > >>> > >>> We need a team to comprehensive look at CouchDB performance. There is a > >>> lot of low-hanging fruit like Robert Kowalski showed a while back, we > >>> should get back into this. Iąm mostly inspired by SQLite whoąve done a > >>> release a while back that only focussed on 1-2% performance > >>> improvements, but got like 20-30 of those and made the thing a lot > >>> faster across the board. I canąt remember where I read about this, but > >>> Iąll update this once I find the link. > >>> > >>> > >>> ## Benchmark Suite > >>> > >>> We need a benchmark suite that tests a variety of different work loads. > >>> The goal here is to run different versions of CouchDB against the same > >>> suite on the same hardware, to see where are going. Iąm imagining a > >>> http://arewefastyet.com style dashboard where we can track this, and > >>> even run this on Pull Requests and not allow them if they significantly > >>> impact performance. > >>> > >>> > >>> ## Synthetic Load Suite > >>> > >>> This one is for end users. Iąd like to be able to say: My app produces > >>> mostly 10-20kb-sized docs, but millions of those in a single database, > >>> or across 1000s of databases, with these views etc. and then run this > on > >>> target hardware so Iąd know, e.g. how many nodes I need for a cluster > >>> with my estimated workload. I know this can only be done in > >>> approximation, but I think this could make a big difference in CouchDB > >>> adoption and feed back into Perf Team mentioned above. > >>> > >>> * * * > >>> > >>> > >>> # Internals > >>> > >>> ## Consolidate Repositories > >>> > >>> With 2.0 we started to experiment with radically small modules for our > >>> components and I think weąve come to the conclusion that some > >>> consolidation is better for us going forward. Obvious candidates for > >>> separate repos are docs, Fauxton etc. but also some of the Erlang > >>> modules that other projects reasonably would use. > >>> > >>> > >>> ## Elixir > >>> > >>> Iąd like it very much if we elevate Elixir as a prime target language > >>> for writing CouchDB internals. I believe this would get us an influx of > >>> new developers that we badly need to get all the things Iąm listing > here > >>> done. Somebody might be looking into the technical aspects of this > >>> already, but we need to decide as a project if we are okay with that. > >>> > >>> > >>> ## GitHub Issues > >>> > >>> I hope we can transition to GitHub Issues soon. > >>> > >>> * * * > >>> > >>> > >>> # Builds > >>> > >>> Iąd like automated builds for source, Docker et.al., rpm, deb, brew, > >>> ports, Mac Binary, etc with proper release channels for people to > >>> subscribe to, all powered by CI for nightly builds, so people can test > >>> in-development versions easily. > >>> > >>> Iąd also like builds that include popular community plugins like Geo or > >>> Fulltext Search. > >>> > >>> > >>> > >>> * * * > >>> > >>> > >>> # Features > >>> > >>> ## Better Support for db-per-user > >>> > >>> I donąt know what this will look like, but this is a pattern, and we > >>> need to support it better. > >>> > >>> One approach could be łvirtual dbs˛ that are backed by a single > >>> database, but thatąs usually at odds with views, so we could make this > >>> an XOR and disable views on these dbs. Since this usually powers > >>> client-heavy apps, querying usually happens there anyway. > >>> > >>> Another approach would be better / easier cross-db aggregation or > >>> querying. There are a few approaches, but nothing really slick. > >>> > >>> > >>> ## Schema Extraction > >>> > >>> I have half an (old) patch that extracts top level fields from a > >>> document and stores them with a hash in an łattachment˛ to the database > >>> header. So we only end up storing doc values and the schema hash. First > >>> of all this trades storage for CPU time (I havenąt measured anything > >>> yet), but more interestingly, we could use that schema data to do smart > >>> things like auto-generating a validation function / mango expression > >>> based on the data that is already in the database. And other fun things > >>> like easier schema migration operations that are native in CouchDB and > >>> thus a lot faster than external ones. For the curious ones, Iąve got > the > >>> idea from V8ąs property access optimisation strategy[10]. > >>> > >>> [10]: > >>> https://github.com/v8/v8/wiki/Design%20Elements#fast-property-access > >>> > >>> * * * > >>> > >>> Alright, thatąs it for now. Canąt wait for your feedback! > >>> > >>> Best > >>> Jan > >>> -- > >>> Professional Support for Apache CouchDB: > >>> https://neighbourhood.ie/couchdb-support/ > >>> > >> > > -- > Professional Support for Apache CouchDB: > https://neighbourhood.ie/couchdb-support/ > > -- Nolan Lawson nolanlawson.com github.com/nolanlawson
