This is in fact the preferred way to do large jobs is to chain the
output of one task as the input for another. On your first job you
would have your output format like this:
JobConf job1 = new NutchJob(getConf());
job1.setOutputPath(new Path("your path"));
job1.setOutputFormat(SequenceFileOutputFormat.class);
In this example we are using the SequenceFileOutputFormat but it really
could be any type of format. Sequence and Map formats are most common.
Then the second job simply uses the input from the first.
JobConf job2 = new NutchJob(getConf());
job2.addInputPath(new Path("your path"));
job2.setInputFormat(SequenceFileInputFormat.class);
Hope this helps.
Dennis Kubes
Phantom wrote:
Hi
Is there a way to chain Map/Reduce tasks ? What I mean is I want the output
a MapReduce task to serve as input to another MapReduce task ? Could
someone
please show me how I can acheive this ?
Thanks
Avinas