Re: [gentoo-user] OT Best way to compress files with digits

David Haller Fri, 31 Oct 2014 11:59:04 -0700

Hello,

On Fri, 31 Oct 2014, Rich Freeman wrote:
>On Fri, Oct 31, 2014 at 11:59 AM,  <meino.cra...@gmx.de> wrote:
>> I am currently checking the compression tools I know of for the
>> best compression ration. But I will definitly miss those I dont
>> know...
>> And sometimes one can do magic with option and switches of that
>> kind of tools I also dont know of.


With 100k pseudo-random digits from bash's $RANDOM % 10 and a
linebreak every 100 digits (in t.lst) I get this (each with --best /
-9 / -m5 (rar) compression-level option):

$ du -b * | sort -rn
101000  t.lst
61544   t.lzop
50733   t.zoo
49696   t.zip
49609   t.lha
49554   t.gz
48907   t.Z
44942   t.rar
44661   t.rzip
44638   t.7z
44592   t.xz
44572   t.bz2
44546   t.lzma
44543   t.lzip

What I find remarkable is that both gzip and good old compress (.Z)
are rather good ;) And above is probably a quite comprehensible list,
and except .Z, .gz and .bz2 all are name as the binaries used to
create them.

I'd use bzip2/xz/lz as there are e.g. [blx]z(e)(grep|cat|less), but
not e.g. 7zgrep, and I guess they can easy access to those archives
quite a bit.

>I can't imagine that any tool will do much better than something like
>lzo, gzip, xz, etc.  You'll definitely benefit from compression though
>- your text files full of digits are encoding 3.3 bits of information
>in an 8-bit ascii character and even if the order of digits in pi can
>be treated as purely random just about any compression algorithm is
>going to get pretty close to that 3.3 bits per digit figure.

Good estimate:

$ calc '101000/(8/3.3)'
        41662.5
and I get from (lzip)
$ calc 44543*8/101000 
        3.528...        (bits/digit)
to zip:
$ calc 49696*8/101000
        ~3.93           (bits/digit)

HTH,
-dnh

-- 
Q: Hobbies?
A: Hating music.            -- Marvin

Re: [gentoo-user] OT Best way to compress files with digits

Reply via email to