On Feb 9, 2009, at 1:34 AM, Antony Blakey wrote:

On 09/02/2009, at 4:01 PM, Adam Kocoloski wrote:

Ok, thanks for the clarification. I don't see any major downsides beyond the ones you already mentioned. The inability to replicate between versions is a bit of a bummer -- I'd want to at least look into a bridge that lets old servers replicate to new ones.

Your point about reducing the chance of collision is a good one, especially since Couch is using a 32 bit sample space for revision IDs. The probability of zero collisions between any two revisions in a given document history is

N!/((N-M)! * N^M)

with N = 2**32 and M = "max rev history". With M = 128, that probability drops to 0.999998. In a 400k document DB where each doc has the max number of revisions it's likely that at least one has a duplicate rev. That's no good. I think we could eventually see transient cases of revisions being skipped by the replicator with the trunk code.

If the revision were an SHA hash (admittedly), wouldn't the increased value space, AND the fact that identical rev == identical document, greatly relieve this problem?

Yes, we do plan to use a hash of the document content for the revision at some point. You're right, we'd need to also increase the value space at the same time to actually relieve the collision problem. 160 bits (or more) may be overkill, though. We'll have to find some middle ground balancing collision probability and resource usage. Best,

Adam

Reply via email to