On Feb 9, 2009, at 12:31 AM, Adam Kocoloski wrote:
Ok, thanks for the clarification. I don't see any major downsides
beyond the ones you already mentioned. The inability to replicate
between versions is a bit of a bummer -- I'd want to at least look
into a bridge that lets old servers replicate to new ones.
Your point about reducing the chance of collision is a good one,
especially since Couch is using a 32 bit sample space for revision
IDs. The probability of zero collisions between any two revisions
in a given document history is
N!/((N-M)! * N^M)
with N = 2**32 and M = "max rev history". With M = 128, that
probability drops to 0.999998. In a 400k document DB where each doc
has the max number of revisions it's likely that at least one has a
duplicate rev. That's no good. I think we could eventually see
transient cases of revisions being skipped by the replicator with
the trunk code.
Adding the revseq doesn't reduce the chances of a duplicate rev, but
it does mean that replication won't accidentally match revisions
from different revseqs. Instead, the concern would be that two
different servers would generate the same revision ID from different
updates at the same revseq. It's a concern only for multi-master
setups, and even then each document that had been updated on both
source and target would only have a 1/N chance of being skipped due
to an accidentally matching revision. I guess it would happen once
every 3 billion times or so.
Or Couch could switch to a 64 bit space for the revision IDs ;-)
There is nothing preventing larger revs (or even non-integer revs) as
it's just stored as a string (real efficient I know). The size could
easily be a server or database setting.
-Damien
Adam
On Feb 8, 2009, at 2:40 PM, Damien Katz wrote:
I don't think it's strictly necessary, but it makes merging new
edits simpler and it significantly reduces the chances of
collisions between revision ids, there is less ambiguity. What
downsides do your see?
-Damien
On Feb 8, 2009, at 2:28 PM, Adam Kocoloski wrote:
Hi Damien, it seems to me that you're conflating two separate
issues. I agree that the revision history should be trimmed, and
that this will potentially introduce spurious conflicts when two
servers have no shared history for a document. I don't see how
this change by itself requires the addition of a revseq to the
JSON revision format. Is it really required?
Adam