You can use MultipleOutputs for this purpose, even though it was not designed for this and a few people on this list are going to raise an eyebrow.
Alex K On Mon, Apr 26, 2010 at 11:39 AM, Xavier Stevens <[email protected]>wrote: > I don't usually bother renaming the files. If you know you want all of > the files, you just iterate over the files in the output directory from > the previous job. And then add those to distributed cache. If the data > is fairly small you can set the number of reducers to 1 on the previous > step as well. > > > -Xavier > > > -----Original Message----- > From: Eric Sammer [mailto:[email protected]] > Sent: Monday, April 26, 2010 11:33 AM > To: [email protected] > Subject: Re: Chaining M/R Jobs > > The easiest way to do this is to write your job outputs to a known > place and then use the FileSystem APIs to rename the part-* files to > what you want them to be. > > On Mon, Apr 26, 2010 at 2:22 PM, Tiago Veloso <[email protected]> > wrote: > > Hi, > > > > I'm trying to find a way to control the output file names. I need this > because I have a situation where I need to run a Job and then use it's > output in the DistributedCache. > > > > So far the only way I've seen that makes it possible is rewriting the > OutputFormat class but that seems a lot of work for such a simple task. > Is there any way to do what I'm looking for? > > > > Tiago Veloso > > [email protected] > > > > > > > > > > > > -- > Eric Sammer > phone: +1-917-287-2675 > twitter: esammer > data: www.cloudera.com > > >
