RE: Chaining M/R Jobs

Xavier Stevens Mon, 26 Apr 2010 11:40:04 -0700

I don't usually bother renaming the files.  If you know you want all of
the files, you just iterate over the files in the output directory from
the previous job.  And then add those to distributed cache.  If the data
is fairly small you can set the number of reducers to 1 on the previous
step as well.



-Xavier


-----Original Message-----
From: Eric Sammer [mailto:[email protected]] 
Sent: Monday, April 26, 2010 11:33 AM
To: [email protected]
Subject: Re: Chaining M/R Jobs

The easiest way to do this is to write your job outputs to a known
place and then use the FileSystem APIs to rename the part-* files to
what you want them to be.

On Mon, Apr 26, 2010 at 2:22 PM, Tiago Veloso <[email protected]>
wrote:
> Hi,
>
> I'm trying to find a way to control the output file names. I need this
because I have a situation where I need to run a Job and then use it's
output in the DistributedCache.
>
> So far the only way I've seen that makes it possible is rewriting the
OutputFormat class but that seems a lot of work for such a simple task.
Is there any way to do what I'm looking for?
>
> Tiago Veloso
> [email protected]
>
>
>
>



-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com

RE: Chaining M/R Jobs

Reply via email to