Re: Is it always called part-00000?

Amogh Vasekar Mon, 18 Jan 2010 03:25:46 -0800

Hi,
Do your "steps" qualify as separate MR jobs? Then using JobClient APIs should 
be more than sufficient for such dependencies.
You can add the whole output directory as input to another one to read all 
files, and provide PathFilter to ignore any files you don't want to be 
processed, like side effect files etc. However, to add recursively, you need to 
list the FileStatus and add to InputPath as required ( probably not needed in 
your case since its an output of a MR job ).


Thanks,
Amogh


On 1/18/10 6:41 AM, "Mark Kerzner" <[email protected]> wrote:

Hi,

I am writing a second step to run after my first Hadoop job step finished.
It is to pick up the results of the previous step and to do further
processing on it. Therefore, I have two questions please.

   1. Is the output file always called  part-00000?
   2. Am I perhaps better off reading all files in the output directory and
   how do I do it?

Thank you,
Mark

PS. Thank you guys for answering my questions - that's a tremendous help and
a great resource.

Mark

Re: Is it always called part-00000?

Reply via email to