Re: [jira] Updated: (COUCHDB-160) replication performance improvements

Adam Kocoloski Tue, 11 Nov 2008 18:55:45 -0800

Hi Damien, the write queue will never be larger than 100 documents inthis code. I think the primary constraint isn't the number ofdocuments in the database but the size of the average document. I'mbuffering 100 at a time without considering the size of each record,so a DB with lots of large attachments could run into memory problemsquickly.

I tried to do some testing with large (>1MB) binary attachments thisafternoon and ran into a lot of stability issues at high concurrency.Quite a few of the replicator's HTTP processes (both GET and POST)died gruesome deaths with session_remotly_closed errors. I found thisthread on trapexit which looked to be related:


http://www.trapexit.org/forum/viewtopic.php?p=44020

but nothing definitive.  I'll keep digging.  Best, Adam


On Nov 11, 2008, at 3:56 PM, Damien Katz wrote:

Wow. Very cool!
One thing that was a problem in the past when attempting to makethings more parallel was with process queues getting backed up. Onlarge replications, if the write target was slow, the readers wouldbe much faster than the writers and the write queue would get hugeand it could cause the erlang vm memory usage to skyrocket, slowingeverything else down and sometimes crashing. As a temporary fix thecurrent replicator only keeps a single doc queued up at a time.
I've not closely at this yet. Is there any thing in thisimplementation that would exhibit similar behaviors if somethinggets behind, or the number of documents is huge?
-Damien


On Nov 11, 2008, at 1:55 PM, Adam Kocoloski (JIRA) wrote:
[ https://issues.apache.org/jira/browse/COUCHDB-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
Adam Kocoloski updated COUCHDB-160:
-----------------------------------

  Attachment: couch_rep.erl.diff

should have mentioned before -- the times quoted are in seconds.
replication performance improvements
------------------------------------

              Key: COUCHDB-160
              URL: https://issues.apache.org/jira/browse/COUCHDB-160
          Project: CouchDB
       Issue Type: Improvement
       Components: Database Core
 Affects Versions: 0.9
         Reporter: Adam Kocoloski
         Priority: Minor
      Attachments: couch_rep.erl.diff
I wrote some code to speed up CouchDB's replication process byparallelizing document requests and using _bulk_docs to writechanges to the target. I tested the speedup as follows:
* 1000 document DB, 1022 update_seq, ~450 KB after compaction
* local and remote machines have ~45 ms latency
* timed requests using timer:tc(couch_rep, replicate,[<<"source">>, <<"target">>]
* all replications are "from scratch"
trunk:
local-local     115
local-remote    145
remote-remote   173
remote-local    146
db size after replication: 1.8 MB
patch:
local-local     1.83
local-remote    38
remote-remote   64
remote-local    35
db size after replication: 453 KB
I'll attach the patch as an update to this issue. It might beworth exposing the "batch size" (currently 100 docs) as aconfigurable parameter. Comments welcome. Best,
Adam
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Updated: (COUCHDB-160) replication performance improvements

Reply via email to