On Mar 17, 2008, at 2:48 PM, Alan Bell wrote:
Jan Lehnardt wrote:
You can do that, too. With attachments, you'd have it all in one
place and would not need to write your views in a way that they
don't pick up old revisions. That said, it is certainly possible to
store older revisions in other documents, if that solves your
problems.
Cheers
Jan
--
well I might be missing something about the way couchdb handles
attachments but this doesn't sound good to me. Adding attachments to
hold the revision history means that the attachments have to be
replicated each time a revision happens.
Right now, this is true. But with attachment level incremental
replication then only attachments that have changed will replicate.
Also a replication conflict is pretty much the same thing as a
revision, a client application would have no knowledge of a
replication conflict happening but this would be good to see in a
wiki-like page history. I can imagine in a distributed system it
would be very hard for the clients to maintain a revision history as
attachments.
I disagree about the difficulty. It's surprisingly simple conceptually.
The first thing is, every time you update the document, simply attach
the previous revision when you save. Eventually there will be a flag
you can pass in to do this automatically.
Then, if there is a replication conflict to resolve, simply open the
two conflicting documents (manually if necessary), update your chosen
winner with any info you want to preserve from the loser (data,
revision histories, etc) , then delete the loser revision.
And that's it. The thing about this system is you can get very simple
or very complicated with the revision history aspects, it's up to the
application developer. The nice thing is you generally don't need to
worry about concurrent or distributed updates with other nodes
attempting the same thing. The same rules still apply and eventually
the conflicts will be resolved.
As for writing views to not pick up old revisions, I think all
applications should assume that all documents are at all times
carrying a bundle of prior versions and replication/save conflicts.
One of the nasty things in Notes is that most applications assume
that replication conflicts don't happen and can break when they do
happen. I think a major feature of Couchdb is sensible handling of
revisions and conflicts. Purging revisions and conflicts is going to
be necessary for some applications, but in others it is desirable to
retain all versions. It would be good at least to be able to specify
which databases to run compaction on and which to exclude.
The scheduling of compaction is something that will be external to the
core database code. Much of the work here isn't in the actually file
level compaction code, but in creating tools to monitor things and
initiate it with desired options.
What is the proposed rule for compaction? Just deleting all
revisions it finds? Deleting old revisions over a certain age?
For the first cut of compaction, it will unconditionally purge all
previous revisions of a document from a database, leaving only the
most recent revisions of the winner and it's conflicts.
Then we will provide a way to perform selective purging during
compaction, probably with a user provided function will be fed each
document at compaction time, and it will return true or false if the
document should be kept or discarded. This is also how deletion
"stubs" will be purged as well (keeping some meta info about deleted
documents is necessary for replication).
Another thought, it would be nice perhaps to run compaction on some
servers but not on others for replicas of the same database. Thus a
bunch of offline clients could compact fairly frequently and
aggressively, however a central server they all replicate with that
has lots of disk space could retain all versions.
Ok, that's a neat use case but I'm not sure how you would handle the
intermediate edits replicating back to the server. Maybe they just get
lost. It seems possible to support such a thing without a lot of work.
We'll see what is possible.
I am thinking in particular of the scenario of OLPC XO laptops
replicating with a school server.
Alan.