Adam Kocoloski wrote:
Simon Metson reminded me that I wrote down something like this for him a few
months back. Here it is. It describes the replication workflow using inline
document attachments, rather than the more efficient multipart requests which
are supported in 0.11. Hope it helps. Regards,
A good starting point. Thanks! Probably worth putting somewhere on the
wiki for future reference.
And...
It terms of broader architectural overviews, you may find Ricky Ho's set of
articles useful:
http://horicky.blogspot.com/2008/10/couchdb-implementation.html
Exactly what I was looking for. Thanks again! (And now that I know
what to look for, I found the link to it on the couch wiki).
Miles
So, the sequence of calls depends on whether you're pulling updates from this
remote server or pushing updates to it. Let's consider the two cases
separately:
## Pull Replication (remote source, local target)
### HEAD /db
Respond with a 200 status code and you're good.
### GET /db/_local/<rep id>
The replicator checkpoints its progress in these _local documents. You can
respond with a 404 if you like, otherwise the response should be JSON that
looks very much like a replication response, e.g. the one described here:
http://books.couchdb.org/relax/reference/replication#Replication%20in%20Detail
Basically, if the _local doc exists and both the source and target DBs, and the documents
agree on the value of "source_last_seq", the replicator will start from the
update sequence on the source.
### GET /db/_changes?style=all_docs&heartbeat=10000&since=N[&feed=continuous]
This is the hard part. The replicator makes this request on a separate
connection to your server, asking for a list of changes since N (the
source_last_seq from the previous step). If the replication is meant to be
permanent, the feed=continuous parameter will be supplied. The best reference
for the response format is definitely the O'Reilly book:
http://books.couchdb.org/relax/reference/change-notifications
### GET /db/docid?revs=true&latest=true&open_revs["1-23420432",...]
You'll see one of these for each updated document if the update does not
already exist on the target. I believe the response is a JSON Array
[{"ok":{"_id":"docid","_rev":"1-23420432", ..rest of doc},
{"missing":"some-bad-rev"}]
The "missing" case is very rare and is usually the result of somebody racing
the replicator.
### GET /db/docid/attachment?rev=1-234923042
Attachments are downloaded separately during pull replication. The correct
response is the binary data.
### PUT /db/_local/<rep id>
Periodically the replicator will try to save an updated _local doc with the new replication
history. The response is {"ok":true, "rev":NewRevId}
That's it for pull replication.
## Push replication (local source, remote target)
The _local doc calls are still there, but now we have two new POSTs:
POST /db/_missing_revs -d '{"docid1":["1-24323423"], "docid2":"["2-23434534"]}
This is the replicator asking the target if these document revisions are
already saved there. The response is a list of the ones that are missing:
{"missing_revs":{"docid2":["2-23434534"]}}
POST /db/_bulk_docs -d '{"new_edits":false, "docs":[... array of documents ...]}
This one is exactly like the regular _bulk_docs call. The new_edits:false
parameter tells the target not to throw conflict, but instead save all these
updates, as conflict revisions if necessary. Currently attachments are inlined,
although in 0.11 we'll be doing special multipart PUTs for documents with
attachments instead of using _bulk_docs (so we don't need to Base64 encode
them). Best,
Adam
--
In theory, there is no difference between theory and practice.
In<fnord> practice, there is. .... Yogi Berra