On Tue, Dec 13, 2011 at 20:08, Alex Besogonov <[email protected]> wrote: > On Mon, Dec 12, 2011 at 10:26 PM, Paul Davis > <[email protected]> wrote: >>> * Merkle trees are great for two-way synchronization, but it's not >>> immediately clear to me how you'd use them to bootstrap a single source -> >>> target replication. I might just be missing a straightforward extension of >>> the tech here. >> This is the point that's important with checksums and so on. Merkle >> trees are great when you want to mirror structured data but CouchDB >> replication is a mirror operation. Think, N db's replicating to a >> central DB. you have a mixture of things which breaks checksums (or at >> least any obvious application I can think of given our internal >> structures) > Uhm. What are the things that break checksums? Right now revision IDs > are _almost_ > deterministic and it's not that hard to make them completely > deterministic. And for > replication purposes nothing else matters. > > To be exact: the only entity used for ID generation is '[Deleted, > OldStart, OldRev, Body, Atts2]' > tuple and only 'Atts2' field can be non-deterministic. And that can be > fixed (with other minor > forward-looking features like explicit versioning). > > Then it's easy to devise a protocol to replicate based on hash trees. > I'm thinking about > this protocol: > 1) The current state of replicated database is identified by a hash. > Suppose that we > have unidirectional replication A->B. > > Let's denote state of the initial database A as A1 and B's as B1. > > We store the ancestry as a list of hashes outside database (so it > doesn't influence the > hash of the database). > > 2) As the first step B sends its list of replication ancestry. > > It's actually not even required to send the whole hashes each time, > just send the first > 4 bytes of each hash. That way even 1 million records of replication > history would take > only 4Mb. The 'A' server then replies with its own set of hashes with > the matching > initial bytes. If there are none, then the client falls back to the > usual replication. > > So at this step 'B' knows the most recent common ancestor and requests > the changes > that have happened since that point of time. Each changeset, naturally, has > its > own hash. > > 3) After these changes are applied and merged, B's state is the A1 > state plus all the > B's changes that might have happened ever since. Then B stores the hashes > of the changesets that have been applied. > > That's it. Should work, as far as I see (it's 3am local time, so I > might miss something). > > Overhead: 16 bytes for the hash information for each changeset.
I think you miss the point that was made above about mirrors, still, unless I misunderstand. B may have other changes interleaves with those received from A, whether from interactive updates or other replications, making its hashes different.
