On Feb 9, 2009, at 12:31 AM, Adam Kocoloski wrote:

Ok, thanks for the clarification. I don't see any major downsides beyond the ones you already mentioned. The inability to replicate between versions is a bit of a bummer -- I'd want to at least look into a bridge that lets old servers replicate to new ones.

Your point about reducing the chance of collision is a good one, especially since Couch is using a 32 bit sample space for revision IDs. The probability of zero collisions between any two revisions in a given document history is

N!/((N-M)! * N^M)

with N = 2**32 and M = "max rev history". With M = 128, that probability drops to 0.999998. In a 400k document DB where each doc has the max number of revisions it's likely that at least one has a duplicate rev. That's no good. I think we could eventually see transient cases of revisions being skipped by the replicator with the trunk code.

Adding the revseq doesn't reduce the chances of a duplicate rev, but it does mean that replication won't accidentally match revisions from different revseqs. Instead, the concern would be that two different servers would generate the same revision ID from different updates at the same revseq. It's a concern only for multi-master setups, and even then each document that had been updated on both source and target would only have a 1/N chance of being skipped due to an accidentally matching revision. I guess it would happen once every 3 billion times or so.

Or Couch could switch to a 64 bit space for the revision IDs ;-)

There is nothing preventing larger revs (or even non-integer revs) as it's just stored as a string (real efficient I know). The size could easily be a server or database setting.

-Damien




Adam

On Feb 8, 2009, at 2:40 PM, Damien Katz wrote:

I don't think it's strictly necessary, but it makes merging new edits simpler and it significantly reduces the chances of collisions between revision ids, there is less ambiguity. What downsides do your see?

-Damien

On Feb 8, 2009, at 2:28 PM, Adam Kocoloski wrote:

Hi Damien, it seems to me that you're conflating two separate issues. I agree that the revision history should be trimmed, and that this will potentially introduce spurious conflicts when two servers have no shared history for a document. I don't see how this change by itself requires the addition of a revseq to the JSON revision format. Is it really required?

Adam




Reply via email to