On Feb 4, 2010, at 5:05 PM, Randall Leeds wrote:

> On Thu, Feb 4, 2010 at 08:17, Adam Kocoloski <[email protected]> wrote:
>> 
>> If we went ahead and implemented this I think the UUID becomes superfluous 
>> from the replicator's perspective.  You wouldn't want to restrict this 
>> Merkle tree check to UUID-matched DBs, as it would be useful for reducing 
>> entropy in a sharded database cluster that stores multiple copies of each 
>> document in different database shards.  In fact, IIRC that was a Dynamo 
>> feature in the original Amazon paper.
> 
> I mostly follow and I think I agree.
> Can you clarify "as it would be useful for reducing entropy..."?
> 
> Randall

Sure, that was too terse on my part.  I'm referring to the case where you're 
promising to write N copies of a document in your cluster, but for whatever 
reason you only succeed W<N times.  Hence "entropy" -- the N shards start 
diverging from one another after transient failures.

You want those missing writes to eventually propagate to the N-W shards that 
didn't get them.  CouchDB's _changes replication works for this purpose, but 
it's relatively resource-intensive because it checks for the existence of every 
update on the target.  I suspect that comparing Merkle trees may be a more 
efficient way to figure out what to replicate in this special case where the 
two DBs are always supposed to be identical.  Cheers,

Adam

Reply via email to