Hello, developers.

I'd like to raise the following item for discussion: making .xz
the default compressor used by portage for documentation, man pages
and info files. That is, the equivalent of:

  PORTAGE_COMPRESS=xz

in make.globals.

Rationale: xz-utils is quite widespread nowadays and it is a part
of @system set. It can achieve better compression ratio than bzip2,
and faster decompression at the same time.

I have confirmed that both sys-apps/man and sys-apps/man-db can
handle .xz compressed man pages, and sys-apps/texinfo can handle .xz
compressed info pages. Major text editors and pagers support .xz
alike .bz2 (i.e. usually they support both or neither :)).

The additional question is: what preset to use? To help discussing
this, I'd like to quote the tables from 'man xz':

     Preset   DictSize   CompCPU   CompMem   DecMem
       -0     256 KiB       0        3 MiB    1 MiB
       -1       1 MiB       1        9 MiB    2 MiB
       -2       2 MiB       2       17 MiB    3 MiB
       -3       4 MiB       3       32 MiB    5 MiB
       -4       4 MiB       4       48 MiB    5 MiB
       -5       8 MiB       5       94 MiB    9 MiB
       -6       8 MiB       6       94 MiB    9 MiB
       -7      16 MiB       6      186 MiB   17 MiB
       -8      32 MiB       6      370 MiB   33 MiB
       -9      64 MiB       6      674 MiB   65 MiB 

     Preset   DictSize   CompCPU   CompMem   DecMem
      -0e     256 KiB       8        4 MiB    1 MiB
      -1e       1 MiB       8       13 MiB    2 MiB
      -2e       2 MiB       8       25 MiB    3 MiB
      -3e       4 MiB       7       48 MiB    5 MiB
      -4e       4 MiB       8       48 MiB    5 MiB
      -5e       8 MiB       7       94 MiB    9 MiB
      -6e       8 MiB       8       94 MiB    9 MiB
      -7e      16 MiB       8      186 MiB   17 MiB
      -8e      32 MiB       8      370 MiB   33 MiB
      -9e      64 MiB       8      674 MiB   65 MiB

I'd like to note here that increasing dictionary size over file size
does not improve compression. However, the options involved in CompCPU
may.

Depending on the expected amount of complexity, I'd either go for:

1) -6e (or -6, the default) -- max CompCPU, reasonable use of memory,
and dictionary larger than most (or all?) documents that are going to
be compressed,

2) -Ne with minimal 'N' for CompCPU==8 and DictSize > filesize -- still
max compression ratio while keeping lowest memory requirements possible.

Your thoughts?

-- 
Best regards,
Michał Górny

Attachment: signature.asc
Description: PGP signature

Reply via email to