On Apr 26, 2010, at 7:39 PM, Xavier Stevens wrote: > I don't usually bother renaming the files. If you know you want all of > the files, you just iterate over the files in the output directory from > the previous job. And then add those to distributed cache. If the data > is fairly small you can set the number of reducers to 1 on the previous > step as well.
And how do I Iterate on a directory? Could you give me a sample code? If relevant I am using hadoop 0.20.2. Tiago Veloso [email protected]
