On Thu, 10 Mar 2011 09:20:34 -0500, dsimcha wrote: > On 3/10/2011 5:57 AM, Lars T. Kyllingstad wrote: >> Nope, a gzip or bzip2 file only contains a single file. To zip several >> files, you first make a tar archive, and then you run gzip or bzip2 on >> it. That's why most compressed archives targeted at the Linux platform >> have extensions like .tar.gz, .tar.bz2, and so on. >> >> -Lars > > This is **exactly** my point. These single-file gzip and bzip2 files > should be usable with exactly the same API as uncompressed file I/O. My > personal use case for this is files that contain large amounts of DNA > sequence. This compresses very well, since besides a little meta-info > it's just a bunch of A's, C's, G's and T's. I want to be able to read > in these huge files and decompress them transparently on the fly. > > Another example (and the one that brought the subject of these > non-tarred gzips to my attention) is the svgz format. This is an image > format, and is literally just a gzipped SVG. Uncompressed SVG is a > ridiculously bloated format but compresses very well, so the SVG > standard requires that gzipped SVG files "just work" transparently with > any SVG-compliant program. I recently added svgz support to plot2kill, > and it was somewhat of a PITA because I had to find the C API buried in > etc.c.zlib and then I got stuck using it instead of a nice D API. > > The bigger point, though, is that use cases for non-tarred single-file > gzips do exist and they should be handled transparently via an interface > identical to normal file I/O.
Although I agree this would be nice, I don't think std.stdio.File is the right place to put it. I think a general streaming framework should be in place first, and File be made to work with it. Then, working with a gzipped/bzipped file should be as simple as wrapping the raw File stream in a compression/decompression stream. -Lars
