[ 
https://issues.apache.org/jira/browse/HADOOP-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618597#action_12618597
 ] 

Vivek Ratan commented on HADOOP-3136:
-------------------------------------

Agree with Owen that it's for the scheduler to decide how many tasks to assign 
to a TT in one heartbeat. Joydeep, your concerns are very valid - a scheduler 
should not be dumb and ignore the overall load/state of the system. Otherwise 
you can end up with lopsided scheduling. And yes, it's fairly hard to decide 
whether to give the TT more than one task, and if so, what combination of Maps 
and Reduces from what jobs. But schedulers will need to start dealing with 
this. Maybe the first step is to assign more than one task only if the system 
is loaded, and assign tasks from different jobs (as you mention) to spread them 
around. At some point, a scheduler should also consider the resources available 
on the TT (mem, CPU) and use that to decide what combination of Map and Reduce 
slots should run on that node. But, as Owen says, this should be a scheduler 
decision and we do want to cut down on the heartbeat calls.  

> Assign multiple tasks per TaskTracker heartbeat
> -----------------------------------------------
>
>                 Key: HADOOP-3136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3136
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>
> In today's logic of finding a new task, we assign only one task per heartbeat.
> We probably could give the tasktracker multiple tasks subject to the max 
> number of free slots it has - for maps we could assign it data local tasks. 
> We could probably run some logic to decide what to give it if we run out of 
> data local tasks (e.g., tasks from overloaded racks, tasks that have least 
> locality, etc.). In addition to maps, if it has reduce slots free, we could 
> give it reduce task(s) as well. Again for reduces we could probably run some 
> logic to give more tasks to nodes that are closer to nodes running most maps 
> (assuming data generated is proportional to the number of maps). For e.g., if 
> rack1 has 70% of the input splits, and we know that most maps are data/rack 
> local, we try to schedule ~70% of the reducers there.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to