On Feb 9, 2009, at 1:34 AM, Antony Blakey wrote:
On 09/02/2009, at 4:01 PM, Adam Kocoloski wrote:
Ok, thanks for the clarification. I don't see any major downsides
beyond the ones you already mentioned. The inability to replicate
between versions is a bit of a bummer -- I'd want to at least look
into a bridge that lets old servers replicate to new ones.
Your point about reducing the chance of collision is a good one,
especially since Couch is using a 32 bit sample space for revision
IDs. The probability of zero collisions between any two revisions
in a given document history is
N!/((N-M)! * N^M)
with N = 2**32 and M = "max rev history". With M = 128, that
probability drops to 0.999998. In a 400k document DB where each
doc has the max number of revisions it's likely that at least one
has a duplicate rev. That's no good. I think we could eventually
see transient cases of revisions being skipped by the replicator
with the trunk code.
If the revision were an SHA hash (admittedly), wouldn't the
increased value space, AND the fact that identical rev == identical
document, greatly relieve this problem?
Yes, we do plan to use a hash of the document content for the revision
at some point. You're right, we'd need to also increase the value
space at the same time to actually relieve the collision problem. 160
bits (or more) may be overkill, though. We'll have to find some
middle ground balancing collision probability and resource usage. Best,
Adam