John Denker
Tue, 17 Aug 1999 14:09:46 -0700
Hi Ted -- At 11:41 PM 8/14/99 -0400, you wrote: > >standard Mathematician's style --- encrypted by formulae >guaranteed to make it opaque to all but those who are trained in the >peculiar style of Mathematics' papers. > ... >someone tried to pursuade me to use Maurer's test >... >too memory intensive and too CPU intensive You are very wise to be skeptical of mathematical mumbo-jumbo. You mentioned questions about efficiency, but I would like to call into question whether the entropy estimate provided by Maurer's Universal Statistical Test (MUST) would be suitable for our purposes, even if it could be computed for free. Don't be fooled by the Universal name. If you looked it up in a real-world dictionary, you might conclude that Universal means all-purpose or general-purpose. But if you look it up in a mathematical dictionary, you will find that a Universal probability distribution has the property that if we compare it to some other distribution, it is not lower by more than some constant factor. Alas, the "constant" depends on what two distributions are being compared, and there is no uniform bound on it! Oooops! In the language of entropy, a Universal entropy-estimator overestimates the entropy by no more than a constant -- but beware, there is no uniform upper bound on the constant. To illustrate this point, I have invented Denker's Universal Statistical Test (DUST) which I hereby disclose and place in the public domain: According to DUST, the entropy of a string is equal to its length. That's it! Now you may not *like* this test, and you may quite rightly decide that it is not suitable for your purposes, but my point is that according to the mathematical definitions, DUST is just as Universal as MUST. There are profound theoretical reasons to believe it is impossible to calculate a useful lower bound on the entropy of a string without knowing how it was prepared. There simply *cannot* be an all-purpose statistical test. If you were to make the mistake of treating a Universal estimator as an all-purpose estimator, and then applying it in a situation where the input might (in whole or in part) be coming from an adversary, you would lay yourself open to a chosen-seed attack (analogous to a chosen-plaintext attack). On the other side of the same coin, if you *do* know something about how the input was prepared, there obviously are things you can do to improve your estimate of its entropy. For example, in the early stages of a hardware RNG, you could use two input channels, sending the differential-mode signal to the next stage, and using the common-mode signal only for error checking. This is a good way to get rid of a certain type of interference, and could be quite useful in the appropriate circumstances. Returning to the ugly side of the coin, you can see that a small change in the way the inputs were prepared would make this differencing scheme worthless, possibly leading to wild overestimates of the entropy. BOTTOM LINE: *) Incorporating an all-purpose entropy-estimator into /dev/random is impossible. *) Incorporating something that *pretends* to be an all-purpose estimator is a Really Bad Idea. *) The present design appears to be the only sound design: whoever provides the inputs is responsible for providing the estimate of the entropy thereof. If no estimate is provided, zero entropy is attributed. Cheers --- jsd