Re: GZip File Reading

dsimcha Thu, 10 Mar 2011 06:27:49 -0800

On 3/10/2011 5:57 AM, Lars T. Kyllingstad wrote:

Nope, a gzip or bzip2 file only contains a single file.  To zip several
files, you first make a tar archive, and then you run gzip or bzip2 on
it.  That's why most compressed archives targeted at the Linux platform
have extensions like .tar.gz, .tar.bz2, and so on.


-Lars

This is **exactly** my point. These single-file gzip and bzip2 filesshould be usable with exactly the same API as uncompressed file I/O. Mypersonal use case for this is files that contain large amounts of DNAsequence. This compresses very well, since besides a little meta-infoit's just a bunch of A's, C's, G's and T's. I want to be able to readin these huge files and decompress them transparently on the fly.

Another example (and the one that brought the subject of thesenon-tarred gzips to my attention) is the svgz format. This is an imageformat, and is literally just a gzipped SVG. Uncompressed SVG is aridiculously bloated format but compresses very well, so the SVGstandard requires that gzipped SVG files "just work" transparently withany SVG-compliant program. I recently added svgz support to plot2kill,and it was somewhat of a PITA because I had to find the C API buried inetc.c.zlib and then I got stuck using it instead of a nice D API.

The bigger point, though, is that use cases for non-tarred single-filegzips do exist and they should be handled transparently via an interfaceidentical to normal file I/O.

Re: GZip File Reading

Reply via email to