Hello, On Fri, 31 Oct 2014, Rich Freeman wrote: >On Fri, Oct 31, 2014 at 11:59 AM, <meino.cra...@gmx.de> wrote: >> I am currently checking the compression tools I know of for the >> best compression ration. But I will definitly miss those I dont >> know... >> And sometimes one can do magic with option and switches of that >> kind of tools I also dont know of.
With 100k pseudo-random digits from bash's $RANDOM % 10 and a linebreak every 100 digits (in t.lst) I get this (each with --best / -9 / -m5 (rar) compression-level option): $ du -b * | sort -rn 101000 t.lst 61544 t.lzop 50733 t.zoo 49696 t.zip 49609 t.lha 49554 t.gz 48907 t.Z 44942 t.rar 44661 t.rzip 44638 t.7z 44592 t.xz 44572 t.bz2 44546 t.lzma 44543 t.lzip What I find remarkable is that both gzip and good old compress (.Z) are rather good ;) And above is probably a quite comprehensible list, and except .Z, .gz and .bz2 all are name as the binaries used to create them. I'd use bzip2/xz/lz as there are e.g. [blx]z(e)(grep|cat|less), but not e.g. 7zgrep, and I guess they can easy access to those archives quite a bit. >I can't imagine that any tool will do much better than something like >lzo, gzip, xz, etc. You'll definitely benefit from compression though >- your text files full of digits are encoding 3.3 bits of information >in an 8-bit ascii character and even if the order of digits in pi can >be treated as purely random just about any compression algorithm is >going to get pretty close to that 3.3 bits per digit figure. Good estimate: $ calc '101000/(8/3.3)' 41662.5 and I get from (lzip) $ calc 44543*8/101000 3.528... (bits/digit) to zip: $ calc 49696*8/101000 ~3.93 (bits/digit) HTH, -dnh -- Q: Hobbies? A: Hating music. -- Marvin