On Sun, Dec 11, 2011 at 9:21 PM, Jason Smith <[email protected]> wrote: > On Mon, Dec 12, 2011 at 9:52 AM, Paul Davis <[email protected]> > wrote: >> On Sun, Dec 11, 2011 at 7:19 PM, Randall Leeds <[email protected]> >> wrote: >>> On Sun, Dec 11, 2011 at 04:00, Alex Besogonov <[email protected]> >>> wrote: >>>> I wonder, why there are no unique instance IDs in CouchDB? I'm >>>> thinking about 'the central server replicates 2000000 documents to a >>>> million of clients' scenario. >>>> >>>> Right now it's not possible to make replication on the 'big central >>>> server' side to be stateless, because the other side tries to write >>>> replication document which is later used to establish common ancestry. >>>> Server can ignore/discard it, but then during the next replication >>>> client would just have to replicate all the changes again. Of course, >>>> the results would be consistent in any case but quite a lot of >>>> additional traffic might be required. >>>> >>>> It should be simple to assign each instance a unique ID (computed >>>> using UUID and the set of applied replication filters) and use it to >>>> establish common replication history. It can even be compatible with >>>> the way the current replication system works and basically the only >>>> visible change should be the addition of UUID to database info. >>>> >>>> Or am I missing something? >>> >>> I proposed UUIDs for databases a long, long time ago and it's come up >>> a few times since. If the UUID is database-level, then storing it with >>> the database is dangerous -- copying a database file would result in >>> two CouchDB's hosting "the same" (but really different) databases. If >>> the UUID is host-level, then this reduces to a re-invention of DNS. In >>> other words, all DBs should already be uniquely identified by their >>> URLs. >>> >>> Regarding your second paragraph, replicating couches _could_ try to >>> establish common ancestry only by examining a local checkpoint of >>> replication, but the couch replicator looks for the log on both >>> couches to ensure that the database hasn't been deleted+recreated nor >>> has it crashed before certain replicated changes hit disk, as a double >>> check that the sequence numbers have the expected shared meaning. >>> >>> It seems like maybe you're wondering about whether couch could >>> generate snapshot ids that are more meaningful than the sequence >>> number. For a single pair of couches the host-db-seq combo is enough >>> information to replicate effectively. When there's more hosts involved >>> we can talk about more powerful checkpoint ids that would be shareable >>> or resolvable to find common ancestry between more than two >>> replicating hosts to speed up those scenarios. My intuition always >>> says that this leads to hash trees, but I haven't thought about it >>> deeply enough to fully conceive of what this accomplishes or how it >>> would work. >>> >>> -R >> >> I did have a shimmering of an idea for this awhile back. Basically we >> do both host and db uuid's and the information we use to identifiy >> replications is a hash of the concatenation. >> >> That way we can copy db's around and not muck with things as well as >> error out a bit. Though this still has a bit of an issue if we copy >> the host uuid around as well. Though we migth be able to look for a >> mac address or something and then fail to boot if the check fails >> (with an optional override if someone changes a nic). > > A couch URL is its unique identifier. A database URL is its unique > identifier. This sounds like a too-clever-by-half optimization. IMHO. > > -- > Iris Couch
To this I ask simply: What's the URL of my phone? Tying a URL to a database is like identifying a person by their address. A UUID per created database is much more fine grained, but has operations issues with file handling and what not. Granted the obvious "what if there were a chip that uniquely identified all machines" is kinda scary so I think we just need to get close enough and warn users when they might be headed for a world of hurt. Also, "we support transitive replication" would make for an amusing bullet point on the front page. :D
