Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

John Denker Sun, 26 Mar 2006 09:54:29 -0800

In the context of

>> 0 occurs with probability 1/2
>> each other number from 1 to 2^{160}+1 happens with
>> probability 2^{-161}.

I wrote:

> This ... serves to illustrate, in an exaggerated way, the necessity
> of not assuming that the raw data words are IID (independent and identically
> distributed).

Correction:  IID isn't the issue here.  The raw data words described
above are IID.  That's not the problem.  The problem is that the
distribution is highly skewed.

Executive summary:  Small samples do not always exhibit "average" behavior.

Let me explain.  There is a very simple way to look at this.

Consider the expression for the entropy of the source,
       S := Sum_i P_i log(1/P_i)                      [1]

where i runs over all symbols in the alphabet.

One may, with a little care, attribute log(1/P_i) bits of unpredictability
to the (i)th symbol in the alphabet.  Then S can be interpreted as the
appropriately weighted _average_ unpredictability per symbol.

In the example quoted above, the minimum log(1/P_i) is vastly smaller than
the average log(1/P_i) -- namely 1 versus 161.  So focussing on the average
is unlikely to solve all the world's problems.

In crypto applications (including RNG construction), a crude yet provably
correct approach is to rely on the minimum (not the average) per-symbol
unpredictability.  Using this approach, it would require 80 samples of the
given distribution to produce an output with 80 bits of entropy.

Things can only get better from here:
 -- With full knowledge of the source statistics, one can distinguish the large
  log(1/Pi) words from the small log(1/Pi) words.  This would allow the number
  of required samples to be closer to the typical value (2) than to the 
worst-case
  value (80).  An example of this is the Huffman coding I discussed in my 
previous
  note.
 -- With even modest knowledge of the given source, one can (by histogramming
  if nothing else) estimate the probabilities well enough to yield worthwhile
  improvements in efficiency.
 -- If you want to produce a long sequence of output words, as you often do, a
  reasonable-sized buffer is all you really need to come fairly close to optimal
  efficiency, namely only about 1 sample of the distribution, on average, per
  80-bit output word.  The chance of getting fooled can be made verrry small,
  at a modest cost in terms of buffer-size and extra samples of the 
distribution.
  That is, the law of large numbers comes to your rescue sooner or later, 
usually
  rather soon.
 -- It may be possible to engineer a different source with larger minimum
  log(1/Pi).

Bottom line:  Setting up a highly-skewed distribution and then drawing from
it only a single sample is not guaranteed to produce "average" behavior.  Duh,
no surprise there!

The source entropy S in equation [1] is a single number that characterizes the
"average" behavior.  For a small sample from a highly-skewed distribution, S
doesn't tell you everything you need to know.  This has no bearing on the
definition of entropy;  entropy is defined by equation [1].  It just means that
equation [1] by itself doesn't solve all the world's problems.

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]

Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

Reply via email to