On Friday 19 June 2009 02:30:37 Denys Vlasenko wrote:
> On Friday 19 June 2009 04:10, Colin Watson wrote:
> > While the metaphors above are certainly colourful, the code that Denys
> > posted does not have randomness even close to 2^128. rand() is entirely
> > deterministic based on the seed value you give to srand(). Denys' patch
> > uses monotonic_us() plus getpid() as the seed. Plenty of systems, such
> > as Linux, cap getpid() to 2^15 or thereabouts by default. Not exactly
> > very random, especially in an installer kind of situation where you may
> > have a fairly predictable sequence of events since boot, but let's
> > generously grant a flat distribution over 2^15.
> >
> > Then we have time - well, whatever time may be, it isn't random. Sure,
> > you have microseconds, so it's a bit better than it might be, but that
> > hardly gives you 113 bits of randomness. CLOCK_MONOTONIC, which Denys'
> > patch used, is typically (and on Linux) implemented simply as a
> > monotonically increasing counter that's initialised to zero at boot, so
> > it has nothing to do with whether you're running at the same wall-clock
> > time as somebody else, only to do with how long your computer has been
> > running. mkswap is often used in that installer kind of situation I
> > mentioned, and in that case you probably have a range of maybe fifteen
> > minutes within which most of the runs are going to fall (obviously some
> > people will take ages to set up partitions, some people will go off and
> > have a coffee, or whatever, but if you're focusing on the task at hand
> > or if you're doing mass automatic deployments then they'll be quite
> > well-clustered). Converted to microseconds, that's 30 bits of
> > randomness.
> >
> > Now, you can't add those 30 bits onto the 15 above because Denys' patch
> > just adds monotonic_us() and getpid() together; it doesn't XOR them or
> > anything. 2^30 + 2^15 is near enough to 2^30 to make no odds.
> > Furthermore ("among our weapons ..."), srand() takes an unsigned int as
> > its argument anyway, so you can't possibly do any better than 2^32 even
> > if you think the assumptions above are wrong.
>
> You are right.
>
> I didn't realize UUIDs need to be that unique.
>
> > Please bear in mind that I was replying specifically to Denys' patch,
> > not to an abstract concept of a patch that used proper random numbers.
> > The reason I'm concerned about it is that I *do* understand the scale of
> > these things and that patch (which is in master now) isn't in the right
> > ballpark at all. I spent several days and nights of my life last year on
> > very little sleep dealing with cleanup and mitigation of the notorious
> > Debian openssl bug, in which an accidental mispatch caused a shortage of
> > randomness such that only 2^15 distinct keys were possible for each key
> > type/length combination. I know what powers of two do, and the kinds of
> > thing that happen if you don't have enough of them.
>
> How about this?
>
>         i = open("/dev/urandom", O_RDONLY);
>         if (i >= 0) {
>                 read(i, buf, 16);
>                 close(i);
>         }
>         /* Paranoia. /dev/urandom may be missing.

Why not just xopen("/dev/urandom") and let it fail if it's not there?

Is it really that interesting to try to run mkswap before you've populated 
/dev when what you usually run it _on_ is a /dev node?  Your script can always 
mknod /dev/urandom by hand.  You can't run much of _anything_ without at least 
/dev/console...

> > Now, granted, it doesn't matter so much if swap UUIDs are really
> > universally unique when you compare them to something like SSH keys. But
> > let's consider the example of a large organisation deploying Linux
> > automatically on lots of machines. We can probably expect those machines
> > to be of roughly the same specification such that they'll reach similar
> > stages of the installer at about the same time, plus or minus a few
> > seconds, and we can certainly expect their deployment to be such that
> > the installer takes an identical or nearly identical code path each
> > time. I'd be surprised at this point if you had much more than 2^24
> > random bits to play with. The probability of real collisions within the
> > organisation is no longer negligible.
>
> Wow, they would mix their swap partitions! Run for your lives.

Yeah, that part confused me too.

> > The generation function you (Rob) sent in
> > http://lists.busybox.net/pipermail/busybox/2009-June/069690.html seems
> > much better to me and largely fine, except for two mistakes:
> >
> >  * RFC2518 appears to be in error when it says that you should set the
> >    most significant bit of the first octet of the node ID to 1. RFC4122
> >    instead says you should set the *least* significant bit, which
> >    matches the description of IEEE 802 MAC addresses in
> >    http://en.wikipedia.org/wiki/MAC_address.

P.S.  Aren't you glad I documented _why_ I was making those changes?  The 
generate_random() in the old_* stuff just tweaks a couple fields without 
particularly explaining why...

> >  * Your code uses uuid[11] as if it were the first octet of the node ID.
> >    I think this should in fact be uuid[10].
> >
> > If I were writing it I think I'd follow libuuid's lead
>
> That code is a good example of code explosion for no discernible reason
> which is so prevalent nowadays. No wonder just about anything today
> takes megabytes of code.
>
> Getting a list of present network cards because I am about to mkswap?!
> No thanks.

Actually there are historical reasons for some of it.  The reason they jumped 
through those hoops to generate uuids way back when is they wanted something 
globally unique, and way back when they didn't have cpuid or decent random 
number generators, so the network card ID and the clock were their main 
sources of presumed uniqueness.  (And if you think that's bad, remember that 
systems aren't guaranteed to have a network card in 'em!  Especially more than 
10 years ago...)

The tweaks to the big 128 bit random block are:

1) set the broadcast bit where the mac address would be.  Since querying a 
network card's mac address should never have the broadcast bit set, thus 
preventing the keyspace of randomly generated uuids from overlapping the 
keyspace of harvested IDs.

2) Claim to be a uuid format that old programs won't try to parse individual 
fields out of and get confused by.

By the way, I've encountered two network cards in the field with the same mac 
address.  Some crazy ultra-cheap korean vendor that used vendor ID + serial 
number and then restarted the serial number when they restarted the production 
line, possibly by accident...  Yes, they wound up on the same LAN.  Luckily I 
wasn't the one who had to track it down.  Luckily they were pci cards and not 
wired into the motherboard, so we could easily just throw one of 'em out.

As for the broadcast bit never actually being on a real card: back around qemu 
0.8.2 the virtual ethernet cards used to have mac addresses enumerated ala 
11:11:11:11:11:11, 22:22:22:22:22:22, 33:... and so on.  Notice that the first 
card doesn't have the broadcast bit set, but every second one does.  Yes, I'm 
the one who hit this.  Thought it was a kernel bug at first: the Linux kernel 
does _not_ like this, it gets _confused_.  These days qemu has some kind of 
stable vendor ID for the first 2 bytes and then go 12:34:56:78 I think.

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to