Dear CouchDB Community,

Replication is one of the central components of CouchDB. It’s what makes our
database unique and gives it properties few others have, such as
master-to-master replication, offline-first capability, ability to set
up custom
replication topologies to suit various user cases.

Over at IBM Cloudant we found our users often push CouchDB’s replication to
its limits, and in some circumstances it struggles with stability,
performance and operability. Last year we identified a few areas which were
more pressing and set out to see if we could improve them:

Some of the issues are summarized here
https://issues.apache.org/jira/browse/COUCHDB-3324

The top two ones were:

1) Allow running a larger number of replication jobs without crashing or
affecting the rest of the system.

2) Do not write transient replication states back to replication documents.
This consumes disk IO and causes operational issues. Instead provide a
status monitoring API specific for replications.

We believe we accomplished those two goals, and improved on other areas as
well. Today we’d like to share that work with the rest of the community. We
are proposing these two PRs:

https://github.com/apache/couchdb/pull/470 : This is the main one to update
the replicator application. The PR description includes a detailed list of
improvement and a nifty graph showing how the new scheduling replicator
behaves in the presence of a large number of jobs.

https://github.com/apache/couchdb-documentation/pull/123 : Documentation PR
to describe the new behavior.  Thanks to a recent initiative from Joan
Touzet, building our docs is now fast and easy. There is a state transition
diagram there which describes all the possible replication jobs states.

Thanks to Garren Smith and other members of the Dashboard Team, Fauxton
support for the new replicator has already been merged in.

Also many thanks to other team members who helped us: Paul Davis for
helping with design and architecture, Rob Frazier and Gino Cubeddu for
their testing and validation work.

We'd like to request approval to integrate these changes into the ASF
CouchDB project. Please test, review, and provide comments and feedback.

Sincerely,
IBM Cloudant Engineering (Robert Newson, Benjamin Bastian, Nick Vatamaniuc)

Reply via email to