On Sun, Dec 11, 2011 at 8:19 PM, Randall Leeds <[email protected]> wrote: > I proposed UUIDs for databases a long, long time ago and it's come up > a few times since. If the UUID is database-level, then storing it with > the database is dangerous -- copying a database file would result in > two CouchDB's hosting "the same" (but really different) databases. If > the UUID is host-level, then this reduces to a re-invention of DNS. In > other words, all DBs should already be uniquely identified by their > URLs. Do people really copy databases? In this case a UUID for DB instance and UUID for the host should do fine. Host UUIDs can be generated during couchdb installation, it should be the easiest way.
There's no good way to uniquely identify hosts, unfortunately (or fortunately). MAC addresses are not reliable and the set of network interfaces can change rapidly. And URLs are definitely out of the question - I'm thinking to use my replicator in home devices that might have duplicate host names with IP addresses assigned by DHCP. > Regarding your second paragraph, replicating couches _could_ try to > establish common ancestry only by examining a local checkpoint of > replication, but the couch replicator looks for the log on both > couches to ensure that the database hasn't been deleted+recreated nor > has it crashed before certain replicated changes hit disk, as a double > check that the sequence numbers have the expected shared meaning. Yes, I guessed that's what ensure_full_commit is used for. > It seems like maybe you're wondering about whether couch could > generate snapshot ids that are more meaningful than the sequence > number. For a single pair of couches the host-db-seq combo is enough > information to replicate effectively. When there's more hosts involved > we can talk about more powerful checkpoint ids that would be shareable > or resolvable to find common ancestry between more than two > replicating hosts to speed up those scenarios. My intuition always > says that this leads to hash trees, but I haven't thought about it > deeply enough to fully conceive of what this accomplishes or how it > would work. Hash trees are definitely interesting, especially since I really want to have deterministic IDs for revisions. But their overhead is something to be considered. Right now I'm at 150000 document insertions/sec for non-bulk updates and I really like the speed.
