On 3/10/2011 5:57 AM, Lars T. Kyllingstad wrote:
Nope, a gzip or bzip2 file only contains a single file.  To zip several
files, you first make a tar archive, and then you run gzip or bzip2 on
it.  That's why most compressed archives targeted at the Linux platform
have extensions like .tar.gz, .tar.bz2, and so on.

-Lars

This is **exactly** my point. These single-file gzip and bzip2 files should be usable with exactly the same API as uncompressed file I/O. My personal use case for this is files that contain large amounts of DNA sequence. This compresses very well, since besides a little meta-info it's just a bunch of A's, C's, G's and T's. I want to be able to read in these huge files and decompress them transparently on the fly.

Another example (and the one that brought the subject of these non-tarred gzips to my attention) is the svgz format. This is an image format, and is literally just a gzipped SVG. Uncompressed SVG is a ridiculously bloated format but compresses very well, so the SVG standard requires that gzipped SVG files "just work" transparently with any SVG-compliant program. I recently added svgz support to plot2kill, and it was somewhat of a PITA because I had to find the C API buried in etc.c.zlib and then I got stuck using it instead of a nice D API.

The bigger point, though, is that use cases for non-tarred single-file gzips do exist and they should be handled transparently via an interface identical to normal file I/O.

Reply via email to