On Mon, Dec 12, 2011 at 7:25 PM, Jason Smith <[email protected]> wrote: > On Tue, Dec 13, 2011 at 8:03 AM, Paul Davis <[email protected]> > wrote: >> Having a UUID for every database created is the ideal >> harmonious-to-theory manifestation of "what is a db?" but we have to >> deal with reality when people may copy a file which makes things a bit >> weird when there are two instances of a UUID db. > > You didn't say "harsh reality," but to list some legitimate situations > where people might copy .couch files: > > * Restoring from backups > * Cloning a VMWare image > * Booting an EC2 AMI > * NAS storage clusters > * Couchbase mobile bootstrapping
Exactly the sorts of reasons why I haven't just slapped a UUID in to the db header. :D > >>> There's actually no problem with moving DBs around today, except that >>> replication starts over (unless you change host names to match). >> >> The "except that replication starts over" is a very significant caveat >> that I would say contradicts the entire "no problem" description. > > Nobody has shown that "replication starts over" is bad. The implicit > assumption is that starting over is costly. At present, yes, that is > true, but that's mostly a bunch of "no-op" round-trips diffing the > revs. > No-op round trips are fine until you have to make millions of them over edge networks. Now there obviously isn't a huge uproar over this inefficiency because we'd be having a very different conversation if there had been. But the fact remains that the current situation is just bad and the only reason there hasn't been an uproar is because we don't yet have a huge enterprise company that's been running some phone replication db for fifteen years without upgrading. Its always better to fix errors in our model before they cause issues though. > If there were a hypothetical single query which let the receiver > assess its exact relationship to an arbitrary sender's data, I don't > think "starts over" would sound as awful. > I agree whole heartedly. And the easiest way I see to making that happen is to decouple the host and db identities in such a way that this is a reality. Its possible there's something elegant we could pull from things like merkle trees. I've spent time considering it and haven't thought of anything but I'd be tickled pink if there were a reasonable solution there. > -- > Iris Couch
