On Dec 12, 2011, at 10:10 PM, Jason Smith wrote:

> On Tue, Dec 13, 2011 at 8:40 AM, Paul Davis <[email protected]> 
> wrote:
>>> If there were a hypothetical single query which let the receiver
>>> assess its exact relationship to an arbitrary sender's data, I don't
>>> think "starts over" would sound as awful.
>>> 
>> 
>> I agree whole heartedly. And the easiest way I see to making that
>> happen is to decouple the host and db identities in such a way that
>> this is a reality. Its possible there's something elegant we could
>> pull from things like merkle trees. I've spent time considering it and
>> haven't thought of anything but I'd be tickled pink if there were a
>> reasonable solution there.
> 
> Yeah. That is why I keep thinking of a checksum that works well with
> incremental map/reduce. I always recall that CRC32 is a commutative,
> associative checksum algorithm. It could hypothetically give you a
> checksum of the entire tree, and all subtrees down to the leaves, as a
> Couch reduce function. So the idea is to reduce the by_seq index. You
> get checksums of the database or subsets free or cheap.
> 
> At this point I am out of my expertise though so I defer.
> 
> -- 
> Iris Couch

Yep, that's a Merkle tree, and brings us back to where this thread sat 24 hours 
ago.  Couple of points:

* You want to stuff the checksums in the id_tree, not the seq_tree.  If you use 
the seq_tree you'll never be able to apply updates that get the checksums 
aligned.

* Merkle trees are great for two-way synchronization, but it's not immediately 
clear to me how you'd use them to bootstrap a single source -> target 
replication.  I might just be missing a straightforward extension of the tech 
here.

Adam

Reply via email to