idea for transitive replication checkpoints

Adam Kocoloski Thu, 17 Feb 2011 18:45:57 -0800

Hi all, Paul and I were chatting at today's CouchDB Hack Night about a way to 
fast-forward replications (thanks Max for the prodding!).  It's non-trivial, 
but I think the benefit for big networks of CouchDB servers can be substantial.


The basic idea is that if A replicates with B, and B with C, then a new 
replication between A and C should not need to start from scratch.  I think we 
can accomplish this as follows:

1) Store the target update sequence along with the source sequence in the 
checkpoint document, at least in the checkpoint document on the target.  The 
following tuple is important: {Source, _local ID, Session ID, SourceSeq, 
TargetSeq}.  Using that syntax let's say we have the following replication 
records:

On A
{A, _local/Foo, Bar, 5, _TargetSeq} % we could omit the target sequence on the 
source

On B
{A, _local/Foo, Bar, 5, 10} % 5 on A corresponds to 10 on B
{B, _local/Baz, Bif, 15, _TargetSeq}

On C
{B, _local/Baz, Bif, 15, 7} % 15 on B corresponds to 7 on C

We know that A -> B happened before B -> C.

2) During the B -> C replication, when we reach source sequence number 10, the 
_changes feed from B will deliver some extra information like

{A, _local/Foo, Bar, 5}

which will be stored at C. This may require a new disk-resident btree keyed on 
update sequence, or at least an in-memory index constructed by walking the 
_local docs btree.

3) When we trigger the A -> C replication, C will walk the full checkpoint 
records in its _local tree and find no mention of A, but then it will also 
consult the "transitive" checkpoints and find the {A, _local/Foo, Bar, 5} 
record.  It'll consult _local/Foo on A, find that the session ID Bar is still 
present, and conclude that it can fast-forward the replication and start from 
update sequence 5.  It will then remove that transitive checkpoint and replace 
it with a full regular checkpoint.

If server A crashes after the A -> B replication and restores from a backup 
that was recorded before the replication, the session ID Bar will be missing 
from _local/Foo, so when we try to do the A -> replication we won't fast 
forward.  This is the correct behavior.

Hopefully this is comprehensible to someone other than me.  We spent some time 
trying to poke holes in it, but it's entirely possible there are other things 
we didn't consider that will prevent it from working.  Cheers,

Adam

idea for transitive replication checkpoints

Reply via email to