Hi Brian,
On 4 Aug 2009, at 10:56, Brian Candler wrote:
On Mon, Aug 03, 2009 at 06:21:34PM +0100, Jason Davies wrote:
Comments welcomed!
ISTM that the "historical" versions are already stored, so why
duplicate
them in the form of an attachment to a new version? And what about
historical versions of attachments anyway?
Wouldn't it be simpler to:
- keep the historical versions by _rev as they are now
- somehow mark these historical versions as worth keeping or not
(could be as simple as reusing the _deleted flag)
- make the "worth keeping" versions survive compaction
Then when you PUT a document, you'd have two options: apply the
_deleted
flag automatically to the old revision, or not. This could be chosen
by URL
parameter perhaps.
Some views might want access to historical revs, but perhaps this
should be
controlled by a view parameter to filter them out for views which
are only
interested in the most recent one. (Incidentally, I would like views
to have
access to live conflicting revs too, but that's a separate issue)
I like the simplicity of your idea, but I'd be interested to hear
Damien's opinion on essentially using MVCC revisions as history too.
Is there a potential difficulty with doing this that we're missing?
You said that it seems unnecessary to duplicate the historical
versions as attachments. Yes, you may have a point, but in the
current way of doing things the duplicates would be removed after
compaction. If I understand things correctly, only new attachments
get written out to disk every time they are added, so it's not as if
*all* historical versions are appended to the database file every time
a document is modified, only a single old version would be appended
(as an attachment) as well as the new doc, of course. The other good
thing about storing historical versions as attachments is that they
would get replicated. Currently we don't replicate old MVCC versions,
this would have to change as well as preventing them from being
compacted as you say.
Good point about storing attachments in the history, this could
potentially become a space issue assuming we simply write the
attachments as JSON docs with the attachments embedded as base64. A
better approach would be to store hashes and store the attachments
themselves separate from the historical versions (using with some kind
of prefix). This way we only write a new historical attachment if it
changes.
All in all, it seems to me that reusing _rev for history saves us
having to doing an additional read and an additional write (reading
the old doc or attachment and then writing it as an attachment). Is
this a good enough reason to reuse _rev for this?
Thanks,
--
Jason Davies
www.jasondavies.com