On 3 Aug 2009, at 17:26, Rune Skou Larsen wrote:

Damien Katz skrev:
2009/7/31 Jason Davies <[email protected]>:
The main points of this proposal are:

1. Store the historical versions of documents in a separate
database.
This
is for a number of reasons: a) keeping it separate means we don't
clog up
the main database with historical data b) history-specific views
can be
kept
here c) non-intrusive implementation of this is easier.

Some comments about the proposal

1. The callbacks must be synchronous. Queueing them for writing later
means the queue can get overloaded and changes lost.
2 Changes can still get lost. We don't have commits across dbs, so
it's possible a crash during update will put the main and history dbs
out of sync.
3. Replicated changes get lost. If a client makes 5 edits to local
replica of a document, then replicates it to a server db, only the
most recent change get recorded in the history.

I would prefer to store the history as attachments to the main document.

-Damien

I agree that _all versions of a document should be in the same database_ because commit-scope of a change should include saving the undo- history.
What good is unreliable undo?

But also for other reasons:
1) Future versions
In my company, we need a system, where we can replicate data to all
couchdb-instances before it should be used. This is also very common in
the CMS-world for scheduling a change to the website. So we need to to
be able to store a future version, which becomes valid at a specified
time and make the "invisible" change between versions (we use a url
rewrite). Thats very tough if current data and history data are in
separate databases and in different formats.

2) Applying views
View'ing on historic docs should be as powerful as viewing "current"
docs. With the proposed format for historic documents, the same view
cannot be applied on current and history db. In fact, complex views
can't  be used at all in the history db, since the one-dimensional
view-index must include time.

I dream of a fully temporal couchdb, where all GET requests can include
the point in time for which I want to see the docs through my views,
lists and shows  :-)

Using attachments is not optimal, because there's still the "un- dynamic" distinction between past, current and future, but its much better than a
seperate db. The attachments-proposal retains the possibility to
manipulate versions of the same doc in one commit-scope.

We've just been discussing this some more on IRC and BenoƮt suggested adding a "_history" member to allow historical versions of documents to be stored there (essentially as attachments, because doc._history would by default only contain stubs). I'd prefer not to overpopulate the "_" namespace so I'm not set on adding doc._history but let's run with this for this discussion.

The stubs would contain basic metadata: last modified timestamp and userCtx that modified the doc (perhaps we can do away with doc._history and add this metadata to the attachment metadata? Or decide on a format for the attachment filename e.g. _history/ <timestamp>/<userCtx>.json?)

This would then make it easy to write views that manipulated the history via the doc._history stubs. I'm thinking we only probably want to send the stubs to the view server, as serialising all the historical data for each doc could get CPU-hungry.

The other question is whether to make this a db-wide setting, perhaps a special doc so that it will be replicated (_history_settings) or perhaps put it in design docs, or do we want to configure it on a per- doc level? Rune suggested something like { _history_settings: { num_docs: 10, ... } }. I would probably lean towards putting it in design docs, so that the decision can be made by the app developer.

There is a possibility that this could be implemented in the _update handler but I'd strongly prefer to have a core module written in Erlang for performance reasons, and to make it easier for people to turn it on and off.

Finally, whartung pointed out this paper: http://www.cs.tau.ac.il/~ohadrode/papers/btree_TOS.pdf which contains some interesting info on using B-trees to support snapshots, maybe someone can comment on the feasibility of supporting that?

Comments welcomed!
--
Jason Davies

www.jasondavies.com

Reply via email to