Hello, On Mon, 12 May 2014 14:47:36 +0400 Alexander Tsoy wrote: > В Sun, 11 May 2014 18:26:32 -0500 > Gordon Pettey <[email protected]> пишет: > > > A lot of small files (e.g. AUTHORS, ChangeLog > > > > FWIW: On my system, I have 59M of bz2 files in /usr/share/man and > > /usr/share/doc. A short script to decompress those and recompress with xz > > -6e reduced that to 36M. > > Very strange o_O > > Here is my test results. xz options: "--lzma2=preset=6e,dict=4MiB". > Larger dictionary size does not improve compression ratio, I get > even worse results with just "-6e" or "-9e". man-bz2 is a full copy of > my /usr/share/man, man-xz is a recompressed one. > > Size comparison: > > $ du -s man-bz2/ man-xz/ > 82032 man-bz2/ > 82308 man-xz/
Please consider that by default du shows block size, not byte size. Than means that if file is actually 1234 bytes large, without -b it will be still accounted for 4096 bytes on 4K-block filesystem. Here are my results: 1. With bzip2 -9: find -O3 /usr/share/man -type f -name "*.bz2" -print0 | du -bhc --files0-from - 63M find -O3 /usr/share/man -type f -name "*.bz2" -print0 | du -hc --files0-from - 146M find -O3 /usr/share/doc -type f -name "*.bz2" -print0 | du -bhc --files0-from - 151M total find -O3 /usr/share/doc -type f -name "*.bz2" -print0 | du -hc --files0-from - 249M total 2. With xz -9e: find -O3 /usr/share/man -type f -name "*.xz" -print0 | du -bhc --files0-from - 64M find -O3 /usr/share/man -type f -name "*.xz" -print0 | du -bhc --files0-from - 146M find -O3 /usr/share/doc -type f -name "*.xz" -print0 | du -bhc --files0-from - 147M total find -O3 /usr/share/doc -type f -name "*.xz" -print0 | du -hc --files0-from - 245M total As one can see, on man pages xz is slightly worse or apparent file sizes and has no difference on real disk usage. On docs xz is better for both sizes. As for decompression speed, xz is about twice as good as bzip2 for a large man pages (bash, mplayer, cmake, zshall). Though this speed gain needs to be measured directly for bunzip2 and unxz applications. I'll publish statistically meaningful results later. Both scripting and testing requires time. Best regards, Andrew Savchenko
pgpQrBI6MpBw6.pgp
Description: PGP signature
