RE: Chaining M/R Jobs

Xavier Stevens Mon, 26 Apr 2010 12:58:24 -0700

I know this works for 0.18.x.  I'm not using 0.20 yet but as long as the API 
hasn't changed to much this should be pretty straightforward.



Path prevOutputPath = new Path("...");
for (FileStatus fstatus : hdfs.listStatus(prevOutputPath)) {
        if (!fstatus.isDir()) {
                DistributedCache.addCacheFile(fstatus.getPath().toUri(), conf);
        }
}


-----Original Message-----
From: Tiago Veloso [mailto:[email protected]] 
Sent: Monday, April 26, 2010 12:11 PM
To: [email protected]
Cc: Tiago Veloso
Subject: Re: Chaining M/R Jobs

On Apr 26, 2010, at 7:39 PM, Xavier Stevens wrote:

> I don't usually bother renaming the files.  If you know you want all of
> the files, you just iterate over the files in the output directory from
> the previous job.  And then add those to distributed cache.  If the data
> is fairly small you can set the number of reducers to 1 on the previous
> step as well.


And how do I Iterate on a directory? Could you give me a sample code?

If relevant I am using hadoop 0.20.2.

Tiago Veloso
[email protected]

RE: Chaining M/R Jobs

Reply via email to