Re: compress files in hadoop

Madhu Ramanna Sat, 11 Jun 2011 07:35:13 -0700

Sure,

I wrote a job that runs hourly / daily and produces several files. I'm
using MultipleOutputs to generate these files. However, when compression
is turned on (bz2), MultipleOutputs produces 0 byte files for all but one
named output. (part files are 14 bytes). Now without compressions,
MultipleOutputs seems to be doing its job fine. Given output is all text,
it saves us a ton of disk space if we compress the output.


Our cluster is cdh3b3 (hadoop-0.20.2)





On 6/10/11 7:26 PM, "Dhruv" <dhru...@gmail.com> wrote:

>Can you be more specific? Tom White's book has a whole section devoted to
>it.
>
>On Fri, Jun 10, 2011 at 7:24 PM, Madhu Ramanna <ma...@buysight.com> wrote:
>
>> Hello,
>>
>> What is the most optimal way to compress several files already in
>>hadoop ?
>>
>>

Re: compress files in hadoop

Reply via email to