Re: Unique instance IDs?

Adam Kocoloski Mon, 12 Dec 2011 19:13:00 -0800

On Dec 12, 2011, at 8:25 PM, Jason Smith wrote:

> On Tue, Dec 13, 2011 at 8:03 AM, Paul Davis <[email protected]> 
> wrote:
>> Having a UUID for every database created is the ideal
>> harmonious-to-theory manifestation of "what is a db?" but we have to
>> deal with reality when people may copy a file which makes things a bit
>> weird when there are two instances of a UUID db.
> 
> You didn't say "harsh reality," but to list some legitimate situations
> where people might copy .couch files:
> 
> * Restoring from backups
> * Cloning a VMWare image
> * Booting an EC2 AMI
> * NAS storage clusters
> * Couchbase mobile bootstrapping
> 
>>> There's actually no problem with moving DBs around today, except that
>>> replication starts over (unless you change host names to match).
>> 
>> The "except that replication starts over" is a very significant caveat
>> that I would say contradicts the entire "no problem" description.
> 
> Nobody has shown that "replication starts over" is bad. The implicit
> assumption is that starting over is costly. At present, yes, that is
> true, but that's mostly a bunch of "no-op" round-trips diffing the
> revs.
> 
> If there were a hypothetical single query which let the receiver
> assess its exact relationship to an arbitrary sender's data, I don't
> think "starts over" would sound as awful.
> 
> -- 
> Iris Couch


Starting over is quite painful for a large target database.  Streaming _changes 
from the source is cheap, but the _missing_revs / _revs_diff API call involves 
a bunch of random id_tree lookups on the target.  If you've got spinning rust 
and IDs that don't follow the sequence numbers these "no-op" checks basically 
top out around 100 IDs / sec / spindle.  Assume 50 MM documents in the target 
DB and you're looking at a week of no-ops before the real replication starts up 
again.  Not pretty.

Adam

Re: Unique instance IDs?

Reply via email to