CC'ing dev@ because it is a dev issue.

On 6 Mar 2009, at 19:17, Chris Anderson wrote:

On Fri, Mar 6, 2009 at 10:05 AM, Jason Smith <[email protected] > wrote:
Hi, list.

While I am happy to be learning Couch for a new project, I am still unsure about some tricks that I used with Django and Rails, such as data migration:

For example, suppose I change my code and instead of using a string
timestamp in my documents, I would prefer a hash with "day", "month", and "year" keys. When I deploy the new code into production, obviously I want
the data structures to change for all existing documents.

So my question is: What is the preferred or recommended method to do this? So far, the only thing I can think of is to write some client code to do
the following:

1. Fetch _all_docs
2. For each document that requires changing, modify it
3. Either PUT the new documents up one by one, or POST them to _bulk_docs,
depending on the situation.

This solution doesn't strike me as particularly horrible, but I was
wondering if there is a better way, perhaps something server-side.

This is basically the way to do it. If you want to be sure you've got
it right, the thing to do is create a view that emits for all docs
with the old timestamp format. Then you can process docs from that
view, until it is empty. This way you can be sure no docs slip through
the cracks.

A migration function, written in JavaScript, and executed on the
server, can fit the CouchDB model, it just has not been implemented
yet. So the above is the way to proceed for the foreseeable future.

It occurred to me that the easiest way to implement this would be the
introduction of a "compaction function". Instead of sending an empty
POST request to `/db/_compact` a user sends a JSON body that
includes a compaction function and potentially options (or just
the plain JS function, doesn't matter). The compaction routine would
then launch a query server and pipe all latest documents through
the function and write out the results into the new DB.

Of course, the current behaviour stays in place and remains
the default case. The proposed method would only help with
changing large deployment situations.

One problem I see is timing issues with client-code and multiple
nodes. Client libs wouldn't know when to expect which document
structure or would have to be needlessly complex. But I think that's
a deployment issue in general and CouchDB could provide
notifications to help with that, but not generally solve that problem.

Is this worth thinking about?

Cheers
Jan
--

Reply via email to