On Jan 11, 2010, at 11:21 AM, Paul Davis wrote: >> My experience is that from time to time someone has a support request >> where the symptom is "CouchDB is so slow as to be unusable" and the >> answer is "set sequential uuids" and they are happen and CouchDB >> "works" again. >> >> Support requests are like cockroaches, for everyone you see there 100 >> others you don't. This math means the default random uuids is one of >> the bigger bugs CouchDB ships with, and the switch to sequential is >> one of the smallest patches with the biggest positive impacts we could >> make. > > Well I wouldn't characterize random UUID's as a bug, but yes they > happen to exacerbate the worse side of the b~tree performance. Though > I don't think that speed alone is reason enough to change the default. > >> The downsides to sequential uuids are these (unless I've missed one). >> >> Info leakage - the sequential uuids could give big brother an idea who >> created a given document. >> >> Gives the wrong idea - people will do stupid things like use the _id >> in lieu of a timestamp or the local_seq for ordering. >> >> Could be better - there's maybe an even better uuid algorithm we could >> discover. >> >> I think the first case is important, but the others aren't that >> compelling. Is there anything I'm missing? > > My biggest concern is that it gives a relative ordering and proximity > information to documents created on a given node (and can spread > between DB's). And its a non-obvious leakage so that people may not > realize that they're leaking such information. It may seem like an > abstract concern but I think its real enough to force users to make > that decision.
I was the one who asked Chris to make the change. The current ids are the worst case for btree insert performance, slowing and bloating both doc inserts and view indexing I don't see leakage as a problem. I don't think we've ever claimed as a feature that our generated id are somehow secure against someone figuring out when and where something might have been created, and I don't know of anyone relying on it. But I agree we should add to the documentation how ids are generated its implications. If someone wants crypto random ids, they can configure it. -Damien > > The sequential algorithm isn't time based, so its misuse doesn't > really play into effect nearly as much as if we were going to try the > utc_random algorithm. > > HTH, > Paul Davis
