In the context of >> 0 occurs with probability 1/2 >> each other number from 1 to 2^{160}+1 happens with >> probability 2^{-161}.
I wrote: > This ... serves to illustrate, in an exaggerated way, the necessity > of not assuming that the raw data words are IID (independent and identically > distributed). Correction: IID isn't the issue here. The raw data words described above are IID. That's not the problem. The problem is that the distribution is highly skewed. Executive summary: Small samples do not always exhibit "average" behavior. Let me explain. There is a very simple way to look at this. Consider the expression for the entropy of the source, S := Sum_i P_i log(1/P_i) [1] where i runs over all symbols in the alphabet. One may, with a little care, attribute log(1/P_i) bits of unpredictability to the (i)th symbol in the alphabet. Then S can be interpreted as the appropriately weighted _average_ unpredictability per symbol. In the example quoted above, the minimum log(1/P_i) is vastly smaller than the average log(1/P_i) -- namely 1 versus 161. So focussing on the average is unlikely to solve all the world's problems. In crypto applications (including RNG construction), a crude yet provably correct approach is to rely on the minimum (not the average) per-symbol unpredictability. Using this approach, it would require 80 samples of the given distribution to produce an output with 80 bits of entropy. Things can only get better from here: -- With full knowledge of the source statistics, one can distinguish the large log(1/Pi) words from the small log(1/Pi) words. This would allow the number of required samples to be closer to the typical value (2) than to the worst-case value (80). An example of this is the Huffman coding I discussed in my previous note. -- With even modest knowledge of the given source, one can (by histogramming if nothing else) estimate the probabilities well enough to yield worthwhile improvements in efficiency. -- If you want to produce a long sequence of output words, as you often do, a reasonable-sized buffer is all you really need to come fairly close to optimal efficiency, namely only about 1 sample of the distribution, on average, per 80-bit output word. The chance of getting fooled can be made verrry small, at a modest cost in terms of buffer-size and extra samples of the distribution. That is, the law of large numbers comes to your rescue sooner or later, usually rather soon. -- It may be possible to engineer a different source with larger minimum log(1/Pi). Bottom line: Setting up a highly-skewed distribution and then drawing from it only a single sample is not guaranteed to produce "average" behavior. Duh, no surprise there! The source entropy S in equation [1] is a single number that characterizes the "average" behavior. For a small sample from a highly-skewed distribution, S doesn't tell you everything you need to know. This has no bearing on the definition of entropy; entropy is defined by equation [1]. It just means that equation [1] by itself doesn't solve all the world's problems. --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]