Re: Best way to write multiple files from a MR job?

Nick Cen Tue, 03 Mar 2009 19:17:10 -0800

have you try the MultipleOutputFormat and it is subclass?


2009/3/4 Stuart White <stuart.whi...@gmail.com>

> I have a large amount of data, from which I'd like to extract multiple
> different types of data, writing each type of data to different sets
> of output files.  What's the best way to accomplish this?  (I should
> mention, I'm only using a mapper.  I have no need for sorting or
> reduction.)
>
> Of course, if I only wanted 1 output file, I can just .collect() the
> output from my mapper and let mapreduce write the output for me.  But,
> to get multiple output files, the only way I can see is to manually
> write the files myself from within my mapper.  If that's the correct
> way, then how can I get a unique filename for each mapper instance?
> Obviously hadoop has solved this problem, because it writes out its
> partition files (part-00000, etc...) with unique numbers.  Is there a
> way for my mappers to get this unique number being used so they can
> use it to ensure a unique filename?
>
> Thanks!
>



-- 
http://daily.appspot.com/food/

Re: Best way to write multiple files from a MR job?

Reply via email to