Hi, Do your "steps" qualify as separate MR jobs? Then using JobClient APIs should be more than sufficient for such dependencies. You can add the whole output directory as input to another one to read all files, and provide PathFilter to ignore any files you don't want to be processed, like side effect files etc. However, to add recursively, you need to list the FileStatus and add to InputPath as required ( probably not needed in your case since its an output of a MR job ).
Thanks, Amogh On 1/18/10 6:41 AM, "Mark Kerzner" <[email protected]> wrote: Hi, I am writing a second step to run after my first Hadoop job step finished. It is to pick up the results of the previous step and to do further processing on it. Therefore, I have two questions please. 1. Is the output file always called part-00000? 2. Am I perhaps better off reading all files in the output directory and how do I do it? Thank you, Mark PS. Thank you guys for answering my questions - that's a tremendous help and a great resource. Mark
