Dear CouchDB Community, Replication is one of the central components of CouchDB. It’s what makes our database unique and gives it properties few others have, such as master-to-master replication, offline-first capability, ability to set up custom replication topologies to suit various user cases.
Over at IBM Cloudant we found our users often push CouchDB’s replication to its limits, and in some circumstances it struggles with stability, performance and operability. Last year we identified a few areas which were more pressing and set out to see if we could improve them: Some of the issues are summarized here https://issues.apache.org/jira/browse/COUCHDB-3324 The top two ones were: 1) Allow running a larger number of replication jobs without crashing or affecting the rest of the system. 2) Do not write transient replication states back to replication documents. This consumes disk IO and causes operational issues. Instead provide a status monitoring API specific for replications. We believe we accomplished those two goals, and improved on other areas as well. Today we’d like to share that work with the rest of the community. We are proposing these two PRs: https://github.com/apache/couchdb/pull/470 : This is the main one to update the replicator application. The PR description includes a detailed list of improvement and a nifty graph showing how the new scheduling replicator behaves in the presence of a large number of jobs. https://github.com/apache/couchdb-documentation/pull/123 : Documentation PR to describe the new behavior. Thanks to a recent initiative from Joan Touzet, building our docs is now fast and easy. There is a state transition diagram there which describes all the possible replication jobs states. Thanks to Garren Smith and other members of the Dashboard Team, Fauxton support for the new replicator has already been merged in. Also many thanks to other team members who helped us: Paul Davis for helping with design and architecture, Rob Frazier and Gino Cubeddu for their testing and validation work. We'd like to request approval to integrate these changes into the ASF CouchDB project. Please test, review, and provide comments and feedback. Sincerely, IBM Cloudant Engineering (Robert Newson, Benjamin Bastian, Nick Vatamaniuc)
