It shouldn't be a problem for Hive to support it (by defining your own input/output file format that does the decompression on the flyer), but we won't be able to parallelize the execution as we do with uncompressed text files, and sequence files, since bz2 compression is not splittable.
So a better solution is to store the data in compressed sequence file format, which saves space, and is also splittable. Zheng On Tue, Dec 2, 2008 at 1:09 AM, Josh Ferguson <[EMAIL PROTECTED]> wrote: > Whatever happened to the compressed storage format? I'd like to keep > delimited files in bz2 if possible to save on space, is that sort of thing > being considered? > > Josh > -- Yours, Zheng
