On Apr 21, 2010, at 8:33 AM, Miles Fidelman wrote: > J Chris Anderson wrote: >> On Apr 20, 2010, at 7:29 PM, Miles Fidelman wrote: >> >>> I've been looking, but can't seem to find any good documentation of the >>> inter-node protocol used for replication. >>> >> As far as I know, the best source for documentation is the code, right now. >> > <snip> >> This is the hard coding (in Ruby) I had to add, to used the CouchDB >> replicator to pull from the Booth server: >> >> http://github.com/jchris/booth/commit/2deff74e03838a6e7ef95b725c4342a08239a2b8#commitcomment-68685 >> >> > aaarrrgggh...... > > I don't suppose anyone out there has scribbled down anything resembling a > sequence diagram or flowchart or list of bullet points or something that > summarizes the steps that happen, and the code that gets run when POST > /_replicate is invoked, or an ASN.1-like summary of the messages that get > exchanged between two couch instances during replication > > right now, replication reminds me of the old Sidney Harris Cartoon, "then a > miracle occurs" (http://www.sciencecartoonsplus.com/pages/gallery.php) > > -- > In theory, there is no difference between theory and practice. > In<fnord> practice, there is. .... Yogi Berra
Hi Miles, Simon Metson reminded me that I wrote down something like this for him a few months back. Here it is. It describes the replication workflow using inline document attachments, rather than the more efficient multipart requests which are supported in 0.11. Hope it helps. Regards, Adam On 8 Dec 2009, at 01:42, Adam Kocoloski wrote: > So, the sequence of calls depends on whether you're pulling updates from this > remote server or pushing updates to it. Let's consider the two cases > separately: > > ## Pull Replication (remote source, local target) > > ### HEAD /db > Respond with a 200 status code and you're good. > > ### GET /db/_local/<rep id> > The replicator checkpoints its progress in these _local documents. You can > respond with a 404 if you like, otherwise the response should be JSON that > looks very much like a replication response, e.g. the one described here: > > http://books.couchdb.org/relax/reference/replication#Replication%20in%20Detail > > Basically, if the _local doc exists and both the source and target DBs, and > the documents agree on the value of "source_last_seq", the replicator will > start from the update sequence on the source. > > ### GET /db/_changes?style=all_docs&heartbeat=10000&since=N[&feed=continuous] > > This is the hard part. The replicator makes this request on a separate > connection to your server, asking for a list of changes since N (the > source_last_seq from the previous step). If the replication is meant to be > permanent, the feed=continuous parameter will be supplied. The best > reference for the response format is definitely the O'Reilly book: > > http://books.couchdb.org/relax/reference/change-notifications > > ### GET /db/docid?revs=true&latest=true&open_revs["1-23420432",...] > > You'll see one of these for each updated document if the update does not > already exist on the target. I believe the response is a JSON Array > > [{"ok":{"_id":"docid","_rev":"1-23420432", ..rest of doc}, > {"missing":"some-bad-rev"}] > > The "missing" case is very rare and is usually the result of somebody racing > the replicator. > > ### GET /db/docid/attachment?rev=1-234923042 > > Attachments are downloaded separately during pull replication. The correct > response is the binary data. > > ### PUT /db/_local/<rep id> > > Periodically the replicator will try to save an updated _local doc with the > new replication history. The response is {"ok":true, "rev":NewRevId} > > That's it for pull replication. > > ## Push replication (local source, remote target) > > The _local doc calls are still there, but now we have two new POSTs: > > POST /db/_missing_revs -d '{"docid1":["1-24323423"], "docid2":"["2-23434534"]} > > This is the replicator asking the target if these document revisions are > already saved there. The response is a list of the ones that are missing: > > {"missing_revs":{"docid2":["2-23434534"]}} > > POST /db/_bulk_docs -d '{"new_edits":false, "docs":[... array of documents > ...]} > > This one is exactly like the regular _bulk_docs call. The new_edits:false > parameter tells the target not to throw conflict, but instead save all these > updates, as conflict revisions if necessary. Currently attachments are > inlined, although in 0.11 we'll be doing special multipart PUTs for documents > with attachments instead of using _bulk_docs (so we don't need to Base64 > encode them). Best, > > Adam
