[gentoo-user] Re: OT Best way to compress files with digits

Grant Edwards Fri, 31 Oct 2014 13:26:49 -0700

On 2014-10-31, Rich Freeman <ri...@gentoo.org> wrote:
> On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gen...@dhaller.de> wrote:
>>
>> On Fri, 31 Oct 2014, Rich Freeman wrote:
>>
>>>I can't imagine that any tool will do much better than something like
>>>lzo, gzip, xz, etc.  You'll definitely benefit from compression though
>>>- your text files full of digits are encoding 3.3 bits of information
>>>in an 8-bit ascii character and even if the order of digits in pi can
>>>be treated as purely random just about any compression algorithm is
>>>going to get pretty close to that 3.3 bits per digit figure.
>>
>> Good estimate:
>>
>> $ calc '101000/(8/3.3)'
>>         41662.5
>> and I get from (lzip)
>> $ calc 44543*8/101000
>>         3.528...        (bits/digit)
>> to zip:
>> $ calc 49696*8/101000
>>         ~3.93           (bits/digit)
>
> Actually, I'm surprised how far off of this the various methods are.
> I was expecting SOME overhead, but not this much.
>
> A fairly quick algorithm would be to encode every possible set of 96
> digits into a 40 byte code (that is just a straight decimal-binary
> conversion).  Then read a "word" at a time and translate it.  This
> will only waste 0.011 bits per digit.


You're cheating.  The algorithm you tested will compress strings of
arbitrary 8-bit values.  The algorithm you proposed will only compress
strings of bytes where each byte can have only one of 10 values.

-- 
Grant Edwards               grant.b.edwards        Yow! I want another
                                  at               RE-WRITE on my CEASAR
                              gmail.com            SALAD!!

[gentoo-user] Re: OT Best way to compress files with digits

Reply via email to