On Thu, Jun 18, 2009 at 05:24:04PM -0500, Rob Landley wrote:
> On Thursday 18 June 2009 14:56:53 Colin Watson wrote:
> > Is this an identical algorithm to that used in libuuid? It doesn't look
> > like it, and I'm not happy with increasing the probability that UUIDs
> > might turn out not to be universally unique. If you apply it this way
> > I'll certainly have to patch it out in the version I distribute ...
> 
> The probability that a 128 bit random number will not be unique is 2^128.  I 
> _guarantee_ you have no conception of how _big_ that is or you wouldn't be 
> worrying about it.
> 
> For discussion of whether or not the output of a supernova produces enough 
> total energy if you captured every last photon of it to cycle a theoretically 
> perfectly efficient computer through all the combinations of a a 2^128 bit 
> key 
> (for purposes of discussion, such a computer uses the transition state of a 
> single electron from least excited state of the innermost orbital of a 
> hydrogen atom to the next least excited state to check one combination), see 
> Bruce Schneier's "Applied Cryptography" (second edition) pages 157-158 
> ("Thermodynamic Limitations").
> 
> Your concerns about the uniqueness of 2 different properly random 128 bit 
> numbers matching show that you have _no_ understanding of scale.  The planet 
> you're standing on is way more likely to be hit by a comet that puts the 
> cockroaches in charge.  If you set every computer on the planet in a loop 
> trying to reproduce it and then wait for it to happen _once_, there'd be more 
> than a 99% probability that the sun would run out of fuel first.
> 
> (You can argue that /dev/urandom isn't _truly_ random, so maybe there's only 
> a 
> 95% chance the the sun would run out of fuel first, but I really don't care.)

Let me expand on my previous reply. I know I'm coming to this list as a
supplicant asking for a patch to be accepted, but on reflection I'm
actually not really willing to be patronised like this without defending
myself a bit.


While the metaphors above are certainly colourful, the code that Denys
posted does not have randomness even close to 2^128. rand() is entirely
deterministic based on the seed value you give to srand(). Denys' patch
uses monotonic_us() plus getpid() as the seed. Plenty of systems, such
as Linux, cap getpid() to 2^15 or thereabouts by default. Not exactly
very random, especially in an installer kind of situation where you may
have a fairly predictable sequence of events since boot, but let's
generously grant a flat distribution over 2^15.

Then we have time - well, whatever time may be, it isn't random. Sure,
you have microseconds, so it's a bit better than it might be, but that
hardly gives you 113 bits of randomness. CLOCK_MONOTONIC, which Denys'
patch used, is typically (and on Linux) implemented simply as a
monotonically increasing counter that's initialised to zero at boot, so
it has nothing to do with whether you're running at the same wall-clock
time as somebody else, only to do with how long your computer has been
running. mkswap is often used in that installer kind of situation I
mentioned, and in that case you probably have a range of maybe fifteen
minutes within which most of the runs are going to fall (obviously some
people will take ages to set up partitions, some people will go off and
have a coffee, or whatever, but if you're focusing on the task at hand
or if you're doing mass automatic deployments then they'll be quite
well-clustered). Converted to microseconds, that's 30 bits of
randomness.

Now, you can't add those 30 bits onto the 15 above because Denys' patch
just adds monotonic_us() and getpid() together; it doesn't XOR them or
anything. 2^30 + 2^15 is near enough to 2^30 to make no odds.
Furthermore ("among our weapons ..."), srand() takes an unsigned int as
its argument anyway, so you can't possibly do any better than 2^32 even
if you think the assumptions above are wrong.

Please bear in mind that I was replying specifically to Denys' patch,
not to an abstract concept of a patch that used proper random numbers.
The reason I'm concerned about it is that I *do* understand the scale of
these things and that patch (which is in master now) isn't in the right
ballpark at all. I spent several days and nights of my life last year on
very little sleep dealing with cleanup and mitigation of the notorious
Debian openssl bug, in which an accidental mispatch caused a shortage of
randomness such that only 2^15 distinct keys were possible for each key
type/length combination. I know what powers of two do, and the kinds of
thing that happen if you don't have enough of them.


Now, granted, it doesn't matter so much if swap UUIDs are really
universally unique when you compare them to something like SSH keys. But
let's consider the example of a large organisation deploying Linux
automatically on lots of machines. We can probably expect those machines
to be of roughly the same specification such that they'll reach similar
stages of the installer at about the same time, plus or minus a few
seconds, and we can certainly expect their deployment to be such that
the installer takes an identical or nearly identical code path each
time. I'd be surprised at this point if you had much more than 2^24
random bits to play with. The probability of real collisions within the
organisation is no longer negligible.

If you then use the same UUID generation algorithm for something more
important than swap, such as mke2fs, then it's actually quite plausible
that real failures will happen due to things like UUID mounts over
iSCSI, or swapping a disk between machines for some reason, or that kind
of thing. Still a rare failure, but I could well imagine it using up a
fair chunk of support time. From this point of view, the kinds of
failures that are rare but do still occasionally happen are often worse
than the kinds of failures that you see all the time, because you can
fix the latter fairly quickly and move on with your life.

I already know that *properly-generated* UUIDs are sufficiently
universally unique for all remotely sensible purposes. That's why I've
already deployed them for filesystem mounting in one distribution with
millions of users, and am working on deploying them in another ... It's
your mailing list, but it would make things a lot easier to have the
courtesy of not being assumed to be entirely naïve.


UUID generation is not all that far away from being a cryptographic
algorithm. Doing cryptography properly is hard work, and people put a
lot of effort into specifying algorithms that get it right. Doing
cryptography badly is very easy. Please don't do a bodge job on it for
the sake of a few hundred bytes that we could easily arrange for most
people to configure away. Mike Frysinger's entirely right that it should
be configurable - I rather doubt that the smallest embedded targets have
much use for UUID filesystem identification anyway!


The generation function you (Rob) sent in
http://lists.busybox.net/pipermail/busybox/2009-June/069690.html seems
much better to me and largely fine, except for two mistakes:

 * RFC2518 appears to be in error when it says that you should set the
   most significant bit of the first octet of the node ID to 1. RFC4122
   instead says you should set the *least* significant bit, which
   matches the description of IEEE 802 MAC addresses in
   http://en.wikipedia.org/wiki/MAC_address.

 * Your code uses uuid[11] as if it were the first octet of the node ID.
   I think this should in fact be uuid[10].

If I were writing it I think I'd follow libuuid's lead and add some
stirring about with rand() and such as a bit of an insurance policy just
in case /dev/urandom is broken (you'd still be screwed, of course, but
at least less screwed than always getting all zeroes or whatever), but
that may not be necessary in the context of busybox.

Can I recommend at least looking at libuuid and seeing what can be used,
though? It could be made a lot smaller simply by stripping out all the
time-based UUID stuff and assuming the existence of /dev/urandom, and as
I said above this should all be configured off by default.

Thanks,

-- 
Colin Watson                                       [[email protected]]
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to