Re: Chaining M/R Jobs

Tiago Veloso Mon, 26 Apr 2010 12:11:19 -0700

On Apr 26, 2010, at 7:39 PM, Xavier Stevens wrote:

> I don't usually bother renaming the files.  If you know you want all of
> the files, you just iterate over the files in the output directory from
> the previous job.  And then add those to distributed cache.  If the data
> is fairly small you can set the number of reducers to 1 on the previous
> step as well.



And how do I Iterate on a directory? Could you give me a sample code?

If relevant I am using hadoop 0.20.2.

Tiago Veloso
[email protected]

Re: Chaining M/R Jobs

Reply via email to