Hello,

On Mon, 12 May 2014 14:47:36 +0400 Alexander Tsoy wrote:
> В Sun, 11 May 2014 18:26:32 -0500
> Gordon Pettey <[email protected]> пишет:
> 
> > A lot of small files (e.g. AUTHORS, ChangeLog
> > 
> > FWIW: On my system, I have 59M of bz2 files in /usr/share/man and
> > /usr/share/doc. A short script to decompress those and recompress with xz
> > -6e reduced that to 36M.
> 
> Very strange o_O 
> 
> Here is my test results. xz options: "--lzma2=preset=6e,dict=4MiB".
> Larger dictionary size does not improve compression ratio, I get
> even worse results with just "-6e" or "-9e". man-bz2 is a full copy of
> my /usr/share/man, man-xz is a recompressed one.
> 
> Size comparison:
> 
> $ du -s man-bz2/ man-xz/
> 82032 man-bz2/
> 82308 man-xz/

Please consider that by default du shows block size, not byte size.
Than means that if file is actually 1234 bytes large, without -b it
will be still accounted for 4096 bytes on 4K-block filesystem.

Here are my results:

1. With bzip2 -9:
find -O3 /usr/share/man -type f -name "*.bz2" -print0 | du -bhc --files0-from -
63M
find -O3 /usr/share/man -type f -name "*.bz2" -print0 | du -hc --files0-from -
146M

find -O3 /usr/share/doc -type f -name "*.bz2" -print0 | du -bhc --files0-from -
151M    total
find -O3 /usr/share/doc -type f -name "*.bz2" -print0 | du -hc --files0-from -
249M    total

2. With xz -9e:
find -O3 /usr/share/man -type f -name "*.xz" -print0 | du -bhc --files0-from -
64M
find -O3 /usr/share/man -type f -name "*.xz" -print0 | du -bhc --files0-from -
146M

find -O3 /usr/share/doc -type f -name "*.xz" -print0 | du -bhc --files0-from -
147M    total
find -O3 /usr/share/doc -type f -name "*.xz" -print0 | du -hc --files0-from -
245M    total

As one can see, on man pages xz is slightly worse or apparent file sizes
and has no difference on real disk usage. On docs xz is better for both sizes.

As for decompression speed, xz is about twice as good as bzip2 for a large man
pages (bash, mplayer, cmake, zshall). Though this speed gain needs to be
measured directly for bunzip2 and unxz applications. I'll publish statistically
meaningful results later. Both scripting and testing requires time.

Best regards,
Andrew Savchenko

Attachment: pgpQrBI6MpBw6.pgp
Description: PGP signature

Reply via email to