I'm writing to check whether modifying replication batch_count and
batch_size parameters for cluster replication is good idea.

Some background – our data platform dev team noticed that under heavy write
load, cluster replication was falling behind. The following warning
messages started appearing in the logs, and the pending_changes value
consistently increased while under load.

[warning] 2017-05-18T20:15:22.320498Z [email protected] <0.316.0>
-------- mem3_sync shards/a0000000-bfffffff/test.1495137986
[email protected]
{pending_changes,474}

What we saw is described in COUCHDB-3421
<https://issues.apache.org/jira/browse/COUCHDB-3421>. In addition, CouchDB
appears to be CPU bound while this is occurring, not I/O bound as would
seem reasonable to expect for replication.

When we looked into this, we discovered in the source two values affecting
replication, batch_size and batch_count. For cluster replication, these
values are fixed at 100 and 1 respectively, so we made them configurable.
We tried various values and it seems increasing the batch_size (and to a
lesser extent) batch_count improves our write performance. As a point of
reference, with batch_count=50 and batch_size=5000 we can handle about
double the write throughput with no warnings. We are experimenting with
other values.

We wanted to know if adjusting these parameters is a sound approach.

Thanks!

- Phil

Reply via email to