I'm writing to check whether modifying replication batch_count and batch_size parameters for cluster replication is good idea.
Some background – our data platform dev team noticed that under heavy write load, cluster replication was falling behind. The following warning messages started appearing in the logs, and the pending_changes value consistently increased while under load. [warning] 2017-05-18T20:15:22.320498Z [email protected] <0.316.0> -------- mem3_sync shards/a0000000-bfffffff/test.1495137986 [email protected] {pending_changes,474} What we saw is described in COUCHDB-3421 <https://issues.apache.org/jira/browse/COUCHDB-3421>. In addition, CouchDB appears to be CPU bound while this is occurring, not I/O bound as would seem reasonable to expect for replication. When we looked into this, we discovered in the source two values affecting replication, batch_size and batch_count. For cluster replication, these values are fixed at 100 and 1 respectively, so we made them configurable. We tried various values and it seems increasing the batch_size (and to a lesser extent) batch_count improves our write performance. As a point of reference, with batch_count=50 and batch_size=5000 we can handle about double the write throughput with no warnings. We are experimenting with other values. We wanted to know if adjusting these parameters is a sound approach. Thanks! - Phil
