I used to write zip files in my reducer, it was very very fast, and pulling
the files out of hdfs as also very fast.
In part this is because each reducer might need to write 26k individual
files, by writing them as a zip file there was only 1 hdfs file.
The job ran about 15x faster that way.
I don't have the code handy any more but it was something on the order of
ZipOutputStream zos = new ZipOutputStream( fs.create("Output.zip"));
where fs is a FileSystem object.
On Thu, Jul 23, 2009 at 8:48 PM, Mark Kerzner <[email protected]> wrote:
> Thank you, MultipleOutputFormat is sufficient.
> Mark
>
> On Thu, Jul 23, 2009 at 12:24 AM, Amogh Vasekar <[email protected]>
> wrote:
>
> > Does MultipleOutputFormat suffice?
> >
> > Cheers!
> > Amogh
> >
> > -----Original Message-----
> > From: Mark Kerzner [mailto:[email protected]]
> > Sent: Thursday, July 23, 2009 6:24 AM
> > To: [email protected]
> > Subject: Output of a Reducer as a zip file?
> >
> > Hi,
> > my output consists of a number of binary files, corresponding text files,
> > and one descriptor file. Is there a way to for my reducer to produce a
> zip
> > of all binary files, another zip of all text ones, and a separate text
> > descriptor? If not, how close to this can I get? For example, I could
> code
> > the binary and the text into one text line of an output file, but then I
> > would need some additional processing.
> >
> > Thank you,
> > Mark
> >
>
--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals