Jason Venner wrote:
We are new to hadoop - 1 week and counting :)
We have a number of tasks that we want to accomplish with hadoop, and
would like to each each of the hadoop steps very simple.
To our current limited understanding this means that we need to set up N
hadoop jobs, and run them manually one after the other, using the output
of one as as the input of the next.
Is there a best practices way of accomplishing this? We are hoping to
avoid gigantic map tasks.
http://lucene.apache.org/hadoop/mapred_tutorial.html#Job+Control
Browse through
http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/JobClient.html#runJob(org.apache.hadoop.mapred.JobConf)
for a code-example on how to monitor your jobs programmatically.
Arun
Thank you all and happy computing.