Re: History Proposal

Jason Davies Mon, 03 Aug 2009 10:22:09 -0700


On 3 Aug 2009, at 17:26, Rune Skou Larsen wrote:

Damien Katz skrev:

2009/7/31 Jason Davies <[email protected]>:
The main points of this proposal are:

1. Store the historical versions of documents in a separate
database.
This
is for a number of reasons: a) keeping it separate means we don't
clog up
the main database with historical data b) history-specific views
can be
kept
here c) non-intrusive implementation of this is easier.

Some comments about the proposal

1. The callbacks must be synchronous. Queueing them for writinglater

means the queue can get overloaded and changes lost.
2 Changes can still get lost. We don't have commits across dbs, so
it's possible a crash during update will put the main and history dbs
out of sync.
3. Replicated changes get lost. If a client makes 5 edits to local
replica of a document, then replicates it to a server db, only the
most recent change get recorded in the history.

I would prefer to store the history as attachments to the maindocument.


-Damien

I agree that _all versions of a document should be in the samedatabase_because commit-scope of a change should include saving the undo-history.

What good is unreliable undo?

But also for other reasons:
1) Future versions
In my company, we need a system, where we can replicate data to all

couchdb-instances before it should be used. This is also very commonin

the CMS-world for scheduling a change to the website. So we need to to
be able to store a future version, which becomes valid at a specified
time and make the "invisible" change between versions (we use a url
rewrite). Thats very tough if current data and history data are in
separate databases and in different formats.

2) Applying views
View'ing on historic docs should be as powerful as viewing "current"
docs. With the proposed format for historic documents, the same view
cannot be applied on current and history db. In fact, complex views
can't  be used at all in the history db, since the one-dimensional
view-index must include time.

I dream of a fully temporal couchdb, where all GET requests caninclude

the point in time for which I want to see the docs through my views,
lists and shows  :-)

Using attachments is not optimal, because there's still the "un-dynamic"distinction between past, current and future, but its much betterthan a

seperate db. The attachments-proposal retains the possibility to
manipulate versions of the same doc in one commit-scope.

We've just been discussing this some more on IRC and Benoît suggestedadding a "_history" member to allow historical versions of documentsto be stored there (essentially as attachments, because doc._historywould by default only contain stubs). I'd prefer not to overpopulatethe "_" namespace so I'm not set on adding doc._history but let's runwith this for this discussion.

The stubs would contain basic metadata: last modified timestamp anduserCtx that modified the doc (perhaps we can do away withdoc._history and add this metadata to the attachment metadata? Ordecide on a format for the attachment filename e.g. _history/<timestamp>/<userCtx>.json?)

This would then make it easy to write views that manipulated thehistory via the doc._history stubs. I'm thinking we only probablywant to send the stubs to the view server, as serialising all thehistorical data for each doc could get CPU-hungry.

The other question is whether to make this a db-wide setting, perhapsa special doc so that it will be replicated (_history_settings) orperhaps put it in design docs, or do we want to configure it on a per-doc level? Rune suggested something like { _history_settings:{ num_docs: 10, ... } }. I would probably lean towards putting it indesign docs, so that the decision can be made by the app developer.

There is a possibility that this could be implemented in the _updatehandler but I'd strongly prefer to have a core module written inErlang for performance reasons, and to make it easier for people toturn it on and off.

Finally, whartung pointed out this paper: http://www.cs.tau.ac.il/~ohadrode/papers/btree_TOS.pdfwhich contains some interesting info on using B-trees to supportsnapshots, maybe someone can comment on the feasibility of supportingthat?


Comments welcomed!
--
Jason Davies

www.jasondavies.com

Re: History Proposal

Reply via email to