On Mar 11, 2009, at 7:07 PM, Chris Anderson wrote:
On Wed, Mar 11, 2009 at 8:34 AM, Damien Katz <[email protected]>
wrote:
On Mar 10, 2009, at 7:06 PM, Chris Anderson wrote:
On Tue, Mar 10, 2009 at 3:44 PM, Damien Katz <[email protected]>
wrote:
This patch breaks the file format and replication API, so
replication
with
earlier versions is not possible.
The rev format has changed. Does this mean that migrating existing
data will involve getting each doc from oldDB, stripping the _rev,
and
loading it into newDB?
Yes, but it should be possible to convert the revs to the new
format too.
But why?
It should be pretty straightforward to write a Python or Ruby script
that does this in bulk to transfer docs. It's essentially a
version of
the python dump / load tools that doesn't require putting the
whole db
on disk as an intermediary.
I'll volunteer but I wonder how I should handle docs with
conflicts in
the oldDB?
Oh that's why. Using the replicator API would work for that.
A little confused as to the plan here. Let me try to articulate:
Write a script that pulls all_docs_by_seq from the old version of
CouchDB in batches of 1000, and for each doc loads the head rev (and
any conflict revs) into memory.
Then it creates a bulk_docs POST for those docs, by stripping the rev
from any docs that don't have conflicts, and any docs that have
conflicts, creating a series of revs like this (pretend there are 199
conflict revs)
1-sdfjhgsaf
2-asdfkjsad
..
199-asdf7tsfd
and applying the revs to each doc in the conflict set. Does the rev
ordering matter? Assuming I don't reuse the prefix number, does the
format/length of the second rev part matter?
Then using a normal POST of an object like {"docs":[...array of
docs...]} to the /db/_bulk_docs URL (with no special query option),
the new docs (and conflict revs) will get stored in the new DB?
Or do I need to assign well-formed made up revs to the non-conflicting
docs (they'd all get "1-foobar") and use the ?new_edits=false option
on the bulk_docs POST ?
To use the new_edits=false, you have to specify a rev history in a doc
_revisions property, like this:
{new_edits:false,
docs:[
{_id:"foo", _revisions={start:2,ids:["133457546","475133454"]} }
]}
The ids are the rev ids without the leading offset, the are send this
way for efficiency. Converting to regular revs, they would look like
"2-133457546" and "1-475133454".
For importing existing docs, I think you could just use the
all_or_nothing:true option and save the multiple copies of the same
documents and they'll all be saved, and you don't have to worry about
the _revisions stuff.
-Damien
I think getting this clear on the list will help everyone's
understanding of the new bulk_docs semantics. (I don't plan to include
in my migrator the ability to transfer any docs which would be lost on
the source DB during compaction... only the HEAD rev and any conflicts
will be transfered.)
Chris
ps I tagged trunk as bulk_transactions (maybe coulda picked a better
name) so we have a record of the last point of 0.9 development that
had the old semantics. Please don't use this tag.
--
Chris Anderson
http://jchris.mfdz.com