A lot of small files (e.g. AUTHORS, ChangeLog

FWIW: On my system, I have 59M of bz2 files in /usr/share/man and
/usr/share/doc. A short script to decompress those and recompress with xz
-6e reduced that to 36M. I don't have a comparison for individual file
differences.

I posted the short bash scripts at
https://gist.github.com/petteyg/96c71fa3c4680552f5c4



On Sun, May 11, 2014 at 4:27 PM, Pacho Ramos <[email protected]> wrote:

> El dom, 11-05-2014 a las 19:46 +0200, Michał Górny escribió:
> > Hello, developers.
> >
> > I'd like to raise the following item for discussion: making .xz
> > the default compressor used by portage for documentation, man pages
> > and info files. That is, the equivalent of:
> >
> >   PORTAGE_COMPRESS=xz
> >
> > in make.globals.
> >
> > Rationale: xz-utils is quite widespread nowadays and it is a part
> > of @system set. It can achieve better compression ratio than bzip2,
> > and faster decompression at the same time.
> >
> > I have confirmed that both sys-apps/man and sys-apps/man-db can
> > handle .xz compressed man pages, and sys-apps/texinfo can handle .xz
> > compressed info pages. Major text editors and pagers support .xz
> > alike .bz2 (i.e. usually they support both or neither :)).
> >
> > The additional question is: what preset to use? To help discussing
> > this, I'd like to quote the tables from 'man xz':
> >
> >      Preset   DictSize   CompCPU   CompMem   DecMem
> >        -0     256 KiB       0        3 MiB    1 MiB
> >        -1       1 MiB       1        9 MiB    2 MiB
> >        -2       2 MiB       2       17 MiB    3 MiB
> >        -3       4 MiB       3       32 MiB    5 MiB
> >        -4       4 MiB       4       48 MiB    5 MiB
> >        -5       8 MiB       5       94 MiB    9 MiB
> >        -6       8 MiB       6       94 MiB    9 MiB
> >        -7      16 MiB       6      186 MiB   17 MiB
> >        -8      32 MiB       6      370 MiB   33 MiB
> >        -9      64 MiB       6      674 MiB   65 MiB
> >
> >      Preset   DictSize   CompCPU   CompMem   DecMem
> >       -0e     256 KiB       8        4 MiB    1 MiB
> >       -1e       1 MiB       8       13 MiB    2 MiB
> >       -2e       2 MiB       8       25 MiB    3 MiB
> >       -3e       4 MiB       7       48 MiB    5 MiB
> >       -4e       4 MiB       8       48 MiB    5 MiB
> >       -5e       8 MiB       7       94 MiB    9 MiB
> >       -6e       8 MiB       8       94 MiB    9 MiB
> >       -7e      16 MiB       8      186 MiB   17 MiB
> >       -8e      32 MiB       8      370 MiB   33 MiB
> >       -9e      64 MiB       8      674 MiB   65 MiB
> >
> > I'd like to note here that increasing dictionary size over file size
> > does not improve compression. However, the options involved in CompCPU
> > may.
> >
> > Depending on the expected amount of complexity, I'd either go for:
> >
> > 1) -6e (or -6, the default) -- max CompCPU, reasonable use of memory,
> > and dictionary larger than most (or all?) documents that are going to
> > be compressed,
> >
> > 2) -Ne with minimal 'N' for CompCPU==8 and DictSize > filesize -- still
> > max compression ratio while keeping lowest memory requirements possible.
> >
> > Your thoughts?
> >
>
> Per:
> https://bugs.gentoo.org/show_bug.cgi?id=372653
>
> Looks like bzip2 was still better for small files :/
>
>
>

Reply via email to