Hi Brian,

On 4 Aug 2009, at 10:56, Brian Candler wrote:

On Mon, Aug 03, 2009 at 06:21:34PM +0100, Jason Davies wrote:
Comments welcomed!

ISTM that the "historical" versions are already stored, so why duplicate
them in the form of an attachment to a new version? And what about
historical versions of attachments anyway?

Wouldn't it be simpler to:

- keep the historical versions by _rev as they are now

- somehow mark these historical versions as worth keeping or not
 (could be as simple as reusing the _deleted flag)

- make the "worth keeping" versions survive compaction

Then when you PUT a document, you'd have two options: apply the _deleted flag automatically to the old revision, or not. This could be chosen by URL
parameter perhaps.

Some views might want access to historical revs, but perhaps this should be controlled by a view parameter to filter them out for views which are only interested in the most recent one. (Incidentally, I would like views to have
access to live conflicting revs too, but that's a separate issue)


I like the simplicity of your idea, but I'd be interested to hear Damien's opinion on essentially using MVCC revisions as history too. Is there a potential difficulty with doing this that we're missing?

You said that it seems unnecessary to duplicate the historical versions as attachments. Yes, you may have a point, but in the current way of doing things the duplicates would be removed after compaction. If I understand things correctly, only new attachments get written out to disk every time they are added, so it's not as if *all* historical versions are appended to the database file every time a document is modified, only a single old version would be appended (as an attachment) as well as the new doc, of course. The other good thing about storing historical versions as attachments is that they would get replicated. Currently we don't replicate old MVCC versions, this would have to change as well as preventing them from being compacted as you say.

Good point about storing attachments in the history, this could potentially become a space issue assuming we simply write the attachments as JSON docs with the attachments embedded as base64. A better approach would be to store hashes and store the attachments themselves separate from the historical versions (using with some kind of prefix). This way we only write a new historical attachment if it changes.

All in all, it seems to me that reusing _rev for history saves us having to doing an additional read and an additional write (reading the old doc or attachment and then writing it as an attachment). Is this a good enough reason to reuse _rev for this?

Thanks,
--
Jason Davies

www.jasondavies.com

Reply via email to