have you try the MultipleOutputFormat and it is subclass?
2009/3/4 Stuart White <stuart.whi...@gmail.com> > I have a large amount of data, from which I'd like to extract multiple > different types of data, writing each type of data to different sets > of output files. What's the best way to accomplish this? (I should > mention, I'm only using a mapper. I have no need for sorting or > reduction.) > > Of course, if I only wanted 1 output file, I can just .collect() the > output from my mapper and let mapreduce write the output for me. But, > to get multiple output files, the only way I can see is to manually > write the files myself from within my mapper. If that's the correct > way, then how can I get a unique filename for each mapper instance? > Obviously hadoop has solved this problem, because it writes out its > partition files (part-00000, etc...) with unique numbers. Is there a > way for my mappers to get this unique number being used so they can > use it to ensure a unique filename? > > Thanks! > -- http://daily.appspot.com/food/