Hello,
On Fri, 31 Oct 2014, Rich Freeman wrote:
>On Fri, Oct 31, 2014 at 11:59 AM, <[email protected]> wrote:
>> I am currently checking the compression tools I know of for the
>> best compression ration. But I will definitly miss those I dont
>> know...
>> And sometimes one can do magic with option and switches of that
>> kind of tools I also dont know of.
With 100k pseudo-random digits from bash's $RANDOM % 10 and a
linebreak every 100 digits (in t.lst) I get this (each with --best /
-9 / -m5 (rar) compression-level option):
$ du -b * | sort -rn
101000 t.lst
61544 t.lzop
50733 t.zoo
49696 t.zip
49609 t.lha
49554 t.gz
48907 t.Z
44942 t.rar
44661 t.rzip
44638 t.7z
44592 t.xz
44572 t.bz2
44546 t.lzma
44543 t.lzip
What I find remarkable is that both gzip and good old compress (.Z)
are rather good ;) And above is probably a quite comprehensible list,
and except .Z, .gz and .bz2 all are name as the binaries used to
create them.
I'd use bzip2/xz/lz as there are e.g. [blx]z(e)(grep|cat|less), but
not e.g. 7zgrep, and I guess they can easy access to those archives
quite a bit.
>I can't imagine that any tool will do much better than something like
>lzo, gzip, xz, etc. You'll definitely benefit from compression though
>- your text files full of digits are encoding 3.3 bits of information
>in an 8-bit ascii character and even if the order of digits in pi can
>be treated as purely random just about any compression algorithm is
>going to get pretty close to that 3.3 bits per digit figure.
Good estimate:
$ calc '101000/(8/3.3)'
41662.5
and I get from (lzip)
$ calc 44543*8/101000
3.528... (bits/digit)
to zip:
$ calc 49696*8/101000
~3.93 (bits/digit)
HTH,
-dnh
--
Q: Hobbies?
A: Hating music. -- Marvin