Hi Damien, the write queue will never be larger than 100 documents in
this code. I think the primary constraint isn't the number of
documents in the database but the size of the average document. I'm
buffering 100 at a time without considering the size of each record,
so a DB with lots of large attachments could run into memory problems
quickly.
I tried to do some testing with large (>1MB) binary attachments this
afternoon and ran into a lot of stability issues at high concurrency.
Quite a few of the replicator's HTTP processes (both GET and POST)
died gruesome deaths with session_remotly_closed errors. I found this
thread on trapexit which looked to be related:
http://www.trapexit.org/forum/viewtopic.php?p=44020
but nothing definitive. I'll keep digging. Best, Adam
On Nov 11, 2008, at 3:56 PM, Damien Katz wrote:
Wow. Very cool!
One thing that was a problem in the past when attempting to make
things more parallel was with process queues getting backed up. On
large replications, if the write target was slow, the readers would
be much faster than the writers and the write queue would get huge
and it could cause the erlang vm memory usage to skyrocket, slowing
everything else down and sometimes crashing. As a temporary fix the
current replicator only keeps a single doc queued up at a time.
I've not closely at this yet. Is there any thing in this
implementation that would exhibit similar behaviors if something
gets behind, or the number of documents is huge?
-Damien
On Nov 11, 2008, at 1:55 PM, Adam Kocoloski (JIRA) wrote:
[ https://issues.apache.org/jira/browse/COUCHDB-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adam Kocoloski updated COUCHDB-160:
-----------------------------------
Attachment: couch_rep.erl.diff
should have mentioned before -- the times quoted are in seconds.
replication performance improvements
------------------------------------
Key: COUCHDB-160
URL: https://issues.apache.org/jira/browse/COUCHDB-160
Project: CouchDB
Issue Type: Improvement
Components: Database Core
Affects Versions: 0.9
Reporter: Adam Kocoloski
Priority: Minor
Attachments: couch_rep.erl.diff
I wrote some code to speed up CouchDB's replication process by
parallelizing document requests and using _bulk_docs to write
changes to the target. I tested the speedup as follows:
* 1000 document DB, 1022 update_seq, ~450 KB after compaction
* local and remote machines have ~45 ms latency
* timed requests using timer:tc(couch_rep, replicate,
[<<"source">>, <<"target">>]
* all replications are "from scratch"
trunk:
local-local 115
local-remote 145
remote-remote 173
remote-local 146
db size after replication: 1.8 MB
patch:
local-local 1.83
local-remote 38
remote-remote 64
remote-local 35
db size after replication: 453 KB
I'll attach the patch as an update to this issue. It might be
worth exposing the "batch size" (currently 100 docs) as a
configurable parameter. Comments welcome. Best,
Adam
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.