bzip2

Michael Mol Wed, 26 Sep 2012 13:30:41 -0700

A few months ago, I filed bug 423651 to ask that bzip2 on the install
media be replaced with
 pbzip2. It was closed a short while later, telling me that it'd
involve changing what's kept in @system, and that had to be discussed
here, rather than in a bug report.


Here's a detailed description of how pbzip2 operates, as described by
a friend of mine:

> pbzip2's compression routine splits the input into blocks (with a default of 
> 900,000
> bytes), which it then feeds into the standard bzip2 compression routine. The 
> output
> of the various calls to the bzip2 compression routine are then concatenated 
> together.
> The end result is the same as if you had first used the "split" command on 
> the input,
> run individual bzip2 commands on the split pieces, then recombined the 
> individual
> bz2 files using cat.
>
> The down side to this is that you have multiple file headers, footers, and 
> byte-align
> padding, plus the fact that bzip2 does a RLE compression stage to fill the 
> buffer it
> feeds to the BWT, the main part of the compression routine. If you happen to 
> have a
> section with 1MiB of the same byte, the pbzip2 front-end will split that into 
> two blocks
> (at the default settings) and feed them to separate bzip2 compressors. bzip2 
> will
> then compress the first block down to a buffer of about 17kiB before passing 
> it on
> to be compressed further, and the rest of the data would have fit within this 
> block, if
> pbzip2 hadn't split it the way it had.
>
> As for decompression, pbzip2 can only really do parallel decompression of 
> files that it
> created, since it seeks for the bz2 file header in order to split it to 
> different workers. One
> reason for this is that the bz2 block header is not byte aligned.

I really don't know how to carry this discussion any further than
this; I'll answer any questions I can.

-- 
:wq

[gentoo-dev] ship app-arch/pbzip2 instead of app-arch/bzip2

Reply via email to