A few months ago, I filed bug 423651 to ask that bzip2 on the install media be replaced with pbzip2. It was closed a short while later, telling me that it'd involve changing what's kept in @system, and that had to be discussed here, rather than in a bug report.
Here's a detailed description of how pbzip2 operates, as described by a friend of mine: > pbzip2's compression routine splits the input into blocks (with a default of > 900,000 > bytes), which it then feeds into the standard bzip2 compression routine. The > output > of the various calls to the bzip2 compression routine are then concatenated > together. > The end result is the same as if you had first used the "split" command on > the input, > run individual bzip2 commands on the split pieces, then recombined the > individual > bz2 files using cat. > > The down side to this is that you have multiple file headers, footers, and > byte-align > padding, plus the fact that bzip2 does a RLE compression stage to fill the > buffer it > feeds to the BWT, the main part of the compression routine. If you happen to > have a > section with 1MiB of the same byte, the pbzip2 front-end will split that into > two blocks > (at the default settings) and feed them to separate bzip2 compressors. bzip2 > will > then compress the first block down to a buffer of about 17kiB before passing > it on > to be compressed further, and the rest of the data would have fit within this > block, if > pbzip2 hadn't split it the way it had. > > As for decompression, pbzip2 can only really do parallel decompression of > files that it > created, since it seeks for the bz2 file header in order to split it to > different workers. One > reason for this is that the bz2 block header is not byte aligned. I really don't know how to carry this discussion any further than this; I'll answer any questions I can. -- :wq