[
https://issues.apache.org/jira/browse/COUCHDB-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adam Kocoloski updated COUCHDB-160:
-----------------------------------
Attachment: couch_rep_v2.diff
Here's an updated patch that uses persistent connections and pipelining to
further accelerate replications where the source is remote. Updated benchmarks
indicate a 3x improvement in performance for remote-local relative to my first
patch, or a total of 10x faster replications than trunk:
parallel+pipeline:
local-remote 31
remote-remote 36
remote-local 13
Note the asymmetry for local-remote vs. remote-local. Replications to remote
targets are still negotiating a new TCP connection for every POST. Now, we're
not allowed to pipeline POSTs, but there's nothing wrong with using persistent
connections. Last I heard, Erlang's HTTP client needs to be updated to deal
with that particular use case:
http://www.erlang.org/pipermail/erlang-questions/2008-August/037113.html
Best, Adam
> replication performance improvements
> ------------------------------------
>
> Key: COUCHDB-160
> URL: https://issues.apache.org/jira/browse/COUCHDB-160
> Project: CouchDB
> Issue Type: Improvement
> Components: Database Core
> Affects Versions: 0.9
> Reporter: Adam Kocoloski
> Priority: Minor
> Attachments: couch_rep.erl.diff, couch_rep_v2.diff
>
>
> I wrote some code to speed up CouchDB's replication process by parallelizing
> document requests and using _bulk_docs to write changes to the target. I
> tested the speedup as follows:
> * 1000 document DB, 1022 update_seq, ~450 KB after compaction
> * local and remote machines have ~45 ms latency
> * timed requests using timer:tc(couch_rep, replicate, [<<"source">>,
> <<"target">>]
> * all replications are "from scratch"
> trunk:
> local-local 115
> local-remote 145
> remote-remote 173
> remote-local 146
> db size after replication: 1.8 MB
> patch:
> local-local 1.83
> local-remote 38
> remote-remote 64
> remote-local 35
> db size after replication: 453 KB
> I'll attach the patch as an update to this issue. It might be worth exposing
> the "batch size" (currently 100 docs) as a configurable parameter. Comments
> welcome. Best,
> Adam
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.