On Sat, Aug 1, 2009 at 3:29 AM, Jason Davies<[email protected]> wrote: > > On 31 Jul 2009, at 14:42, Benoit Chesneau wrote: > >> 2009/7/31 Jason Davies <[email protected]>: >> >>> The main points of this proposal are: >>> >>> 1. Store the historical versions of documents in a separate database. >>> This >>> is for a number of reasons: a) keeping it separate means we don't clog up >>> the main database with historical data b) history-specific views can be >>> kept >>> here c) non-intrusive implementation of this is easier. >>> 2. The change will be made at the couch_db layer so that *any* change to >>> any >>> document in the target database will be mirrored to the history database. >> >> seem good. >> >>> 3. Each and every change to a document will result in a new document >>> being >>> created in the history database (with a new ID) containing an exact copy >>> of >>> that document e.g. {_id: <new ID>, doc: <exact copy of doc> }. >> >> How would you handle case of attachements ? If attachements are copied >> for each revision of a doc, it would take a lot of place. Maybe >> storing attachements in their own doc could be solution though. So >> storing a revision would be >> >> store attachements in differents docs >> create a doc {_id: <id>, doc: <doc>, attachments: [<id1>, ...]} >> >> attachements will be tests across revisions depending of their signature >> if signature change, a new atatchment doc is created. >> >> Just a thought anyway. > > Good idea, the disk space issue would be quite important for larger > databases with larger number of changes. I wonder if some kind of > alternative storage layer supporting diffs would help here. Probably > something to consider as a future improvement. > >> >> >>> 4. Adding meta-data to changes can be handled by a custom _update handler >>> (yet to be developed) to set fields such as "last_modified" and >>> "last_modified_user".
I've been quiet on this thread as I'm largely in agreement with the proposal. I think the best route for implementation is to allow Erlang callbacks on changes. This way we can write a simple history function that copies off each change to a backup db, setting timestamps and userCtx metadata on the way. The user interface could surface this function's activation in the node config as a check box, and applications wouldn't need to know about it at all. It should be possible to develop a generic futon-like interface for browsing old documents to revert individual changes, so users can work with non-backup-aware applications. As far as keeping track of time ranges when backups are turned off, the user interface could record a timestamped metadata document to the backup db whenever the switch is flipped. Chris >> >> why not adding date metadata when storing revision . The obvious one I >> mean userCtx, and date? > > My idea was that userCtx and date could be stored using _update, or do you > think this should be done automatically? It's certainly a possibility but I > wouldn't want to add unnecessary data if the user doesn't need it, although > I imagine in 99% of cases they would need the "date/time" of the change in > the history. > >> >>> >>> One use case we'd like to support is effectively (from the point of the >>> user) being able to "roll back" a view to a specific point in time, but >>> how >>> this would look in the history database has me stumped so far. Rolling >>> back >>> a specific doc is easy, but multiple docs, not so easy it seems. Any >>> suggestions welcome! >>> >> >> rolling back could be handled on a view based on date in history database >> ? > > Indeed, but I haven't been able to come up with such a view without blowing > the reduce limitations. I want to do something like fetch all the latest > history docs that were changed before some particular date. As Jan pointed > out though, this could be solved using snapshot databases instead. > > -- > Jason Davies > > www.jasondavies.com > > -- Chris Anderson http://jchrisa.net http://couch.io
