Hi all, Paul and I were chatting at today's CouchDB Hack Night about a way to
fast-forward replications (thanks Max for the prodding!). It's non-trivial,
but I think the benefit for big networks of CouchDB servers can be substantial.
The basic idea is that if A replicates with B, and B with C, then a new
replication between A and C should not need to start from scratch. I think we
can accomplish this as follows:
1) Store the target update sequence along with the source sequence in the
checkpoint document, at least in the checkpoint document on the target. The
following tuple is important: {Source, _local ID, Session ID, SourceSeq,
TargetSeq}. Using that syntax let's say we have the following replication
records:
On A
{A, _local/Foo, Bar, 5, _TargetSeq} % we could omit the target sequence on the
source
On B
{A, _local/Foo, Bar, 5, 10} % 5 on A corresponds to 10 on B
{B, _local/Baz, Bif, 15, _TargetSeq}
On C
{B, _local/Baz, Bif, 15, 7} % 15 on B corresponds to 7 on C
We know that A -> B happened before B -> C.
2) During the B -> C replication, when we reach source sequence number 10, the
_changes feed from B will deliver some extra information like
{A, _local/Foo, Bar, 5}
which will be stored at C. This may require a new disk-resident btree keyed on
update sequence, or at least an in-memory index constructed by walking the
_local docs btree.
3) When we trigger the A -> C replication, C will walk the full checkpoint
records in its _local tree and find no mention of A, but then it will also
consult the "transitive" checkpoints and find the {A, _local/Foo, Bar, 5}
record. It'll consult _local/Foo on A, find that the session ID Bar is still
present, and conclude that it can fast-forward the replication and start from
update sequence 5. It will then remove that transitive checkpoint and replace
it with a full regular checkpoint.
If server A crashes after the A -> B replication and restores from a backup
that was recorded before the replication, the session ID Bar will be missing
from _local/Foo, so when we try to do the A -> replication we won't fast
forward. This is the correct behavior.
Hopefully this is comprehensible to someone other than me. We spent some time
trying to poke holes in it, but it's entirely possible there are other things
we didn't consider that will prevent it from working. Cheers,
Adam