Re: In all the talk of super computers there is not...

Steven M. Bellovin Thu, 06 Sep 2007 15:18:15 -0700

On Thu, 6 Sep 2007 09:28:40 -0400 (EDT)
"Leichter, Jerry" <[EMAIL PROTECTED]> wrote:


> | Hi Martin,
> | 
> | I did forget to say that it would be salted so that throws it off by
> | 2^12
> | 
> | A couple of questions. How did you come up with the ~2.5 bits per
> | word? Would a longer word have more bits?
> He misapplied an incorrect estimate!  :-) The usual estimate - going
> back to Shannon's original papers on information theory, actually - is
> that natural English text has about 2.5 (I think it's usually given as
> 2.4) bits of entropy per *character*.  There are several problems
> here:

It's less than that.  See, for example, the bottom of the first page of
http://www.cs.brown.edu/courses/cs195-5/extras/shannon-1951.pdf :

        From this analysis it appears that, in ordinary literary
        English, the long range statistical effects (up to 100 letters)
        reduce the entropy to something of the order of one bit per
        letter, with a corresponding redundancy of roughly 75%. The
        redundancy may be still higher when structure extending over
        paragraphs, chapters, etc. is included.

> 
>       - The major one is that the estimate should be for
> *characters*, not *words*.  So the number of bits of entropy in
>               a 55-character phrase is about 137 (132, if you use
>               2.4 bits/character), not 30.
> 
>       - The minor one is that the English entropy estimate looks
> just at letters and spaces, not punctuation and capitalization.
>               So it's probably low anyway.  However, this is a much
>               smaller effect.

The interesting question is whether or not one can effectively
enumerate candidate phrases for a guessing program.  For that problem,
punctuation and capitalization are important.

                --Steve Bellovin, http://www.cs.columbia.edu/~smb

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]

Re: In all the talk of super computers there is not...

Reply via email to