On 2014-10-31, Rich Freeman <ri...@gentoo.org> wrote: > On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gen...@dhaller.de> wrote: >> >> On Fri, 31 Oct 2014, Rich Freeman wrote: >> >>>I can't imagine that any tool will do much better than something like >>>lzo, gzip, xz, etc. You'll definitely benefit from compression though >>>- your text files full of digits are encoding 3.3 bits of information >>>in an 8-bit ascii character and even if the order of digits in pi can >>>be treated as purely random just about any compression algorithm is >>>going to get pretty close to that 3.3 bits per digit figure. >> >> Good estimate: >> >> $ calc '101000/(8/3.3)' >> 41662.5 >> and I get from (lzip) >> $ calc 44543*8/101000 >> 3.528... (bits/digit) >> to zip: >> $ calc 49696*8/101000 >> ~3.93 (bits/digit) > > Actually, I'm surprised how far off of this the various methods are. > I was expecting SOME overhead, but not this much. > > A fairly quick algorithm would be to encode every possible set of 96 > digits into a 40 byte code (that is just a straight decimal-binary > conversion). Then read a "word" at a time and translate it. This > will only waste 0.011 bits per digit.
You're cheating. The algorithm you tested will compress strings of arbitrary 8-bit values. The algorithm you proposed will only compress strings of bytes where each byte can have only one of 10 values. -- Grant Edwards grant.b.edwards Yow! I want another at RE-WRITE on my CEASAR gmail.com SALAD!!