On Dec 12, 2011, at 8:25 PM, Jason Smith wrote: > On Tue, Dec 13, 2011 at 8:03 AM, Paul Davis <[email protected]> > wrote: >> Having a UUID for every database created is the ideal >> harmonious-to-theory manifestation of "what is a db?" but we have to >> deal with reality when people may copy a file which makes things a bit >> weird when there are two instances of a UUID db. > > You didn't say "harsh reality," but to list some legitimate situations > where people might copy .couch files: > > * Restoring from backups > * Cloning a VMWare image > * Booting an EC2 AMI > * NAS storage clusters > * Couchbase mobile bootstrapping > >>> There's actually no problem with moving DBs around today, except that >>> replication starts over (unless you change host names to match). >> >> The "except that replication starts over" is a very significant caveat >> that I would say contradicts the entire "no problem" description. > > Nobody has shown that "replication starts over" is bad. The implicit > assumption is that starting over is costly. At present, yes, that is > true, but that's mostly a bunch of "no-op" round-trips diffing the > revs. > > If there were a hypothetical single query which let the receiver > assess its exact relationship to an arbitrary sender's data, I don't > think "starts over" would sound as awful. > > -- > Iris Couch
Starting over is quite painful for a large target database. Streaming _changes from the source is cheap, but the _missing_revs / _revs_diff API call involves a bunch of random id_tree lookups on the target. If you've got spinning rust and IDs that don't follow the sequence numbers these "no-op" checks basically top out around 100 IDs / sec / spindle. Assume 50 MM documents in the target DB and you're looking at a week of no-ops before the real replication starts up again. Not pretty. Adam
