It shouldn't be a problem for Hive to support it (by defining your own
input/output file format that does the decompression on the flyer), but we
won't be able to parallelize the execution as we do with uncompressed text
files, and sequence files, since bz2 compression is not splittable.

So a better solution is to store the data in compressed sequence file
format, which saves space, and is also splittable.

Zheng

On Tue, Dec 2, 2008 at 1:09 AM, Josh Ferguson <[EMAIL PROTECTED]> wrote:

> Whatever happened to the compressed storage format? I'd like to keep
> delimited files in bz2 if possible to save on space, is that sort of thing
> being considered?
>
> Josh
>



-- 
Yours,
Zheng

Reply via email to