[ http://issues.apache.org/jira/browse/HADOOP-7?page=comments#action_12421847 ] Mikkel Kamstrup Erlandsen commented on HADOOP-7: ------------------------------------------------
This is likely me being dumb, but I don't think this issue is fixed. When I run any of the provided example programs wordcount/grep (also pi with specualtive excecution enabled) reduce tasks does not start before all map tasks have completed. My cluster contains three nodes and I am running Hadoop 0.4.0. > MapReduce has a series of problems concerning task-allocation to worker nodes > ----------------------------------------------------------------------------- > > Key: HADOOP-7 > URL: http://issues.apache.org/jira/browse/HADOOP-7 > Project: Hadoop > Issue Type: Bug > Environment: All > Reporter: Mike Cafarella > Fix For: 0.1.0 > > Attachments: jobtracker.patch > > > The MapReduce JobTracker is not great at allocating tasks to TaskTracker > worker nodes. > Here are the problems: > 1) There is no speculative execution of tasks > 2) Reduce tasks must wait until all map tasks are completed before doing any > work > 3) TaskTrackers don't distinguish between Map and Reduce jobs. Also, the > number of > tasks at a single node is limited to some constant. That means you can get > weird deadlock > problems upon machine failure. The reduces take up all the available > execution slots, but they > don't do productive work, because they're waiting for a map task to complete. > Of course, that > map task won't even be started until the reduce tasks finish, so you can see > the problem... > 4) The JobTracker is so complicated that it's hard to fix any of these. > The right solution is a rewrite of the JobTracker to be a lot more flexible > in task handling. > It has to be a lot simpler. One way to make it simpler is to add an > abstraction I'll call > "TaskInProgress". Jobs are broken into chunks called TasksInProgress. All > the TaskInProgress > objects must be complete, somehow, before the Job is complete. > A single TaskInProgress can be executed by one or more Tasks. TaskTrackers > are assigned Tasks. > If a Task fails, we report it back to the JobTracker, where the > TaskInProgress lives. The TIP can then > decide whether to launch additional Tasks or not. > Speculative execution is handled within the TIP. It simply launches multiple > Tasks in parallel. The > TaskTrackers have no idea that these Tasks are actually doing the same chunk > of work. The TIP > is complete when any one of its Tasks are complete. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
