Re: Chaining M/R Jobs

Tiago Veloso Mon, 26 Apr 2010 13:24:57 -0700

It worked thanks.

Tiago Veloso
[email protected]




On Apr 26, 2010, at 8:57 PM, Xavier Stevens wrote:

> I know this works for 0.18.x.  I'm not using 0.20 yet but as long as the API 
> hasn't changed to much this should be pretty straightforward.
> 
> 
> Path prevOutputPath = new Path("...");
> for (FileStatus fstatus : hdfs.listStatus(prevOutputPath)) {
>       if (!fstatus.isDir()) {
>               DistributedCache.addCacheFile(fstatus.getPath().toUri(), conf);
>       }
> }
> 
> 
> -----Original Message-----
> From: Tiago Veloso [mailto:[email protected]] 
> Sent: Monday, April 26, 2010 12:11 PM
> To: [email protected]
> Cc: Tiago Veloso
> Subject: Re: Chaining M/R Jobs
> 
> On Apr 26, 2010, at 7:39 PM, Xavier Stevens wrote:
> 
>> I don't usually bother renaming the files.  If you know you want all of
>> the files, you just iterate over the files in the output directory from
>> the previous job.  And then add those to distributed cache.  If the data
>> is fairly small you can set the number of reducers to 1 on the previous
>> step as well.
> 
> 
> And how do I Iterate on a directory? Could you give me a sample code?
> 
> If relevant I am using hadoop 0.20.2.
> 
> Tiago Veloso
> [email protected]
>

Re: Chaining M/R Jobs

Reply via email to