[jira] Commented: (HADOOP-3136) Assign multiple tasks per TaskTracker heartbeat

eric baldeschwieler (JIRA) Wed, 17 Sep 2008 16:20:37 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632003#action_12632003
 ]


eric baldeschwieler commented on HADOOP-3136:
---------------------------------------------

What if the heartbeat was really cheap?  If all it did was informed the JT of 
the opportunity to schedule a task to the node and returned task compete info, 
than I don't think we'd have a problem with a heartbeat per task complete.

I'd advocate this.  Decouple scheduling decisions from heartbeats.  The JT 
could the do whatever scheduling it chose and assign new tasks to TT directly 
when ready.

---

The problem with assigning many local tasks to a TT on a heartbeat is that you 
could still end up assigning many tasks to a few nodes and none to others and 
getting slower execution.  We still might want to try that since, the results 
could still be better than any of our current options and it is simple to code 
and understand.

The suggestion:

If (node local tasks available) {
   assign as many node local tasks as available to the TT
} elseif (switch local task is available)
   assign one switch local task
} else {
   assign one remote task
}

I'm sure we could do better, but this is simple and worth trying.

---

I think it is clear we're going to need a global scheduler that plans the 
allocation of a jobs tasks to nodes globally and then monitors execution and 
adjusts the plan.

---

On reduces, sounds good to allocate those separately.

> Assign multiple tasks per TaskTracker heartbeat
> -----------------------------------------------
>
>                 Key: HADOOP-3136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3136
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3136_0_20080805.patch, 
> HADOOP-3136_1_20080809.patch, HADOOP-3136_2_20080911.patch
>
>
> In today's logic of finding a new task, we assign only one task per heartbeat.
> We probably could give the tasktracker multiple tasks subject to the max 
> number of free slots it has - for maps we could assign it data local tasks. 
> We could probably run some logic to decide what to give it if we run out of 
> data local tasks (e.g., tasks from overloaded racks, tasks that have least 
> locality, etc.). In addition to maps, if it has reduce slots free, we could 
> give it reduce task(s) as well. Again for reduces we could probably run some 
> logic to give more tasks to nodes that are closer to nodes running most maps 
> (assuming data generated is proportional to the number of maps). For e.g., if 
> rack1 has 70% of the input splits, and we know that most maps are data/rack 
> local, we try to schedule ~70% of the reducers there.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3136) Assign multiple tasks per TaskTracker heartbeat

Reply via email to