Sure, I wrote a job that runs hourly / daily and produces several files. I'm using MultipleOutputs to generate these files. However, when compression is turned on (bz2), MultipleOutputs produces 0 byte files for all but one named output. (part files are 14 bytes). Now without compressions, MultipleOutputs seems to be doing its job fine. Given output is all text, it saves us a ton of disk space if we compress the output.
Our cluster is cdh3b3 (hadoop-0.20.2) On 6/10/11 7:26 PM, "Dhruv" <dhru...@gmail.com> wrote: >Can you be more specific? Tom White's book has a whole section devoted to >it. > >On Fri, Jun 10, 2011 at 7:24 PM, Madhu Ramanna <ma...@buysight.com> wrote: > >> Hello, >> >> What is the most optimal way to compress several files already in >>hadoop ? >> >>