On Feb 4, 2010, at 5:05 PM, Randall Leeds wrote: > On Thu, Feb 4, 2010 at 08:17, Adam Kocoloski <[email protected]> wrote: >> >> If we went ahead and implemented this I think the UUID becomes superfluous >> from the replicator's perspective. You wouldn't want to restrict this >> Merkle tree check to UUID-matched DBs, as it would be useful for reducing >> entropy in a sharded database cluster that stores multiple copies of each >> document in different database shards. In fact, IIRC that was a Dynamo >> feature in the original Amazon paper. > > I mostly follow and I think I agree. > Can you clarify "as it would be useful for reducing entropy..."? > > Randall
Sure, that was too terse on my part. I'm referring to the case where you're promising to write N copies of a document in your cluster, but for whatever reason you only succeed W<N times. Hence "entropy" -- the N shards start diverging from one another after transient failures. You want those missing writes to eventually propagate to the N-W shards that didn't get them. CouchDB's _changes replication works for this purpose, but it's relatively resource-intensive because it checks for the existence of every update on the target. I suspect that comparing Merkle trees may be a more efficient way to figure out what to replicate in this special case where the two DBs are always supposed to be identical. Cheers, Adam
