On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gen...@dhaller.de> wrote:
>
> On Fri, 31 Oct 2014, Rich Freeman wrote:
>
>>I can't imagine that any tool will do much better than something like
>>lzo, gzip, xz, etc.  You'll definitely benefit from compression though
>>- your text files full of digits are encoding 3.3 bits of information
>>in an 8-bit ascii character and even if the order of digits in pi can
>>be treated as purely random just about any compression algorithm is
>>going to get pretty close to that 3.3 bits per digit figure.
>
> Good estimate:
>
> $ calc '101000/(8/3.3)'
>         41662.5
> and I get from (lzip)
> $ calc 44543*8/101000
>         3.528...        (bits/digit)
> to zip:
> $ calc 49696*8/101000
>         ~3.93           (bits/digit)

Actually, I'm surprised how far off of this the various methods are.
I was expecting SOME overhead, but not this much.

A fairly quick algorithm would be to encode every possible set of 96
digits into a 40 byte code (that is just a straight decimal-binary
conversion).  Then read a "word" at a time and translate it.  This
will only waste 0.011 bits per digit.

--
Rich

Reply via email to