Here is a parallel implementation of xzcat:

http://git.annexia.org/?p=pxzcat.git;a=tree

Some test results:

  4 cores:  xzcat: 23.8 s  pxzcat: 8.1 s   speed up: 2.9
  8 cores:  xzcat: 26.8 s  pxzcat: 10.5 s  speed up: 2.55

I just wrote this as a quick hack in a couple of hours, so while it
may be of interest it's not a long term solution.  (It would be better
to get the xzcat -T flag working).

Notes on functionality/limitations:

(1) Unless the xz file was built using the --block-size parameter with
a smallish block size (eg. 16 megabytes) then pxzcat **WILL NOT WORK**.

(2) I have not tested it with multi-stream files, but it should work
with them.

(3) It requires that the input and output files are real files.  It
does not work for streaming.

Notes on performance:

- Scalability is not too bad on my laptop (4 core machine above) but
much worse on a theoretically higher performing machine with SSDs (8
core machine above).  I don't really understand why that is.

- For reasons I don't understand, both regular xzcat and pxzcat cause
the output file to be flushed to disk after the program exits.  This
causes any program which consumes the output of the file to slow down.
Indeed, virt-builder (for which I wrote this) actually slows down a
lot when using pxzcat.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Reply via email to