On Apr 26, 2010, at 7:39 PM, Xavier Stevens wrote:

> I don't usually bother renaming the files.  If you know you want all of
> the files, you just iterate over the files in the output directory from
> the previous job.  And then add those to distributed cache.  If the data
> is fairly small you can set the number of reducers to 1 on the previous
> step as well.


And how do I Iterate on a directory? Could you give me a sample code?

If relevant I am using hadoop 0.20.2.

Tiago Veloso
[email protected]

Reply via email to