I don't usually bother renaming the files. If you know you want all of the files, you just iterate over the files in the output directory from the previous job. And then add those to distributed cache. If the data is fairly small you can set the number of reducers to 1 on the previous step as well.
-Xavier -----Original Message----- From: Eric Sammer [mailto:[email protected]] Sent: Monday, April 26, 2010 11:33 AM To: [email protected] Subject: Re: Chaining M/R Jobs The easiest way to do this is to write your job outputs to a known place and then use the FileSystem APIs to rename the part-* files to what you want them to be. On Mon, Apr 26, 2010 at 2:22 PM, Tiago Veloso <[email protected]> wrote: > Hi, > > I'm trying to find a way to control the output file names. I need this because I have a situation where I need to run a Job and then use it's output in the DistributedCache. > > So far the only way I've seen that makes it possible is rewriting the OutputFormat class but that seems a lot of work for such a simple task. Is there any way to do what I'm looking for? > > Tiago Veloso > [email protected] > > > > -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com
