It worked thanks.
Tiago Veloso
[email protected]
On Apr 26, 2010, at 8:57 PM, Xavier Stevens wrote:
> I know this works for 0.18.x. I'm not using 0.20 yet but as long as the API
> hasn't changed to much this should be pretty straightforward.
>
>
> Path prevOutputPath = new Path("...");
> for (FileStatus fstatus : hdfs.listStatus(prevOutputPath)) {
> if (!fstatus.isDir()) {
> DistributedCache.addCacheFile(fstatus.getPath().toUri(), conf);
> }
> }
>
>
> -----Original Message-----
> From: Tiago Veloso [mailto:[email protected]]
> Sent: Monday, April 26, 2010 12:11 PM
> To: [email protected]
> Cc: Tiago Veloso
> Subject: Re: Chaining M/R Jobs
>
> On Apr 26, 2010, at 7:39 PM, Xavier Stevens wrote:
>
>> I don't usually bother renaming the files. If you know you want all of
>> the files, you just iterate over the files in the output directory from
>> the previous job. And then add those to distributed cache. If the data
>> is fairly small you can set the number of reducers to 1 on the previous
>> step as well.
>
>
> And how do I Iterate on a directory? Could you give me a sample code?
>
> If relevant I am using hadoop 0.20.2.
>
> Tiago Veloso
> [email protected]
>