[ http://issues.apache.org/jira/browse/HADOOP-580?page=comments#action_12440501 ] Doug Cutting commented on HADOOP-580: -------------------------------------
It sounds like you want to run user code in the TaskTracker. Right now, user code only runs in per-task child processes. We also run some in the JobTracker, but would like to get rid of that, so that no long-running daemons run user code, as discussed in the following thread: http://www.mail-archive.com/hadoop-dev%40lucene.apache.org/msg03967.html > Job setup and take down on Nodes > -------------------------------- > > Key: HADOOP-580 > URL: http://issues.apache.org/jira/browse/HADOOP-580 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Reporter: Benjamin Reed > > It would be nice if there was a hook for doing job provisioning and cleanup > on compute nodes. The TaskTracker implicitly knows when a job starts (a task > for the job is received) and pollForTaskWithClosedJob() will explicitly say > that a job is finished if a Map task has been run (If only Reduce tasks have > run and are finished I don't think pollForTaskWithClosedJob() will return > anything will it?), but child Tasks do not get this information. > It would be nice if there was a hook so that programmers could do some > provisioning when a job starts and cleanup when a job ends. Caching addresses > some of the provisioning, but in some cases a helper daemon may need to be > started or the results of queries need to be retrieved and having startJob(), > finishJob() callbacks that happen exactly once for each node that runs part > of the job would be wonderful. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira