progress update on OTP replicator

Adam Kocoloski Sat, 07 Mar 2009 07:18:42 -0800

Hi again, I put the new replicator through its paces over the past fewdays and am pretty happy with it. Summary of the changes:


http://github.com/kocolosk/couchdb/commits/otpify-replication

* Only one copy of a given source->target replication can run at anygiven time. Other POSTs to _replicate with the same request will beadded as listeners to the ongoing replication. When that replicationfinishes, we respond to the original requester and check for newupdates to include in the response to the others. This is Option 3 inthe previous thread on dev@

* Replications that terminate with an abnormal reason (other thanshutdown) will be restarted by the supervisor, but the listeners onthe original request are just given an {error, reason} response. Theycan track the results of the retry by re-POSTing.

* Attachments are streamed directly to disk in the case of a pullreplication. This feature was made possible by some bleeding-edgeupdates to ibrowse courtesy of Chandru Mullaparthi. Is still requiresquite a lot of memory in the ibrowse processes, but we're working onthat. The memory footprint in the CouchDB processes is negligible.Testing between two EC2 nodes indicated that pulling 400MB ofattachments was approximately 20x faster than pushing them (push stillinlines in the JSON).

* Memory utilization has decreased. All the tests posted by JeffHinrichs in COUCHDB-270 pass, although the tests with 20MB JSONdocuments still use over a gig of memory in ibrowse processes.

* The parallel asynchronous GET requests we used in pull replicationshave been temporarily rolled back out. I have a branch that includesthem, but I want to make sure there are no regressions in the memoryutilization before merging that feature back in. As a result, you'llfind that push replications will be significantly faster than pulls ifthere are no large attachments hanging around.


* Replication updates show up in the Futon status window.

Side note: I redefined couch_util:should_flush() so that it considersboth the process' memory usage as well as any associated reference-counted binaries in calculating its response. The code to find thosebinaries relies on a part of process_info() that "may be changed orremoved without prior notice" and isn't terribly well documented, butwe ignore binary memory usage at our peril. With the old definitionshould_flush() would come back false even when the replicator hadgrabbed 2GB worth of binaries off the disk!

If there are no objections I'd like to add this code to trunk latertoday. Best,


Adam

progress update on OTP replicator

Reply via email to