[ 
https://issues.apache.org/jira/browse/HADOOP-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619345#action_12619345
 ] 

Joydeep Sen Sarma commented on HADOOP-3136:
-------------------------------------------

hmm - that would make things very complicated indeed. i really meant a queue - 
FIFO. the TT would just run serially off the queue subject to available slots. 
the JT decides the ordering off the queue entirely.

with a single runnable queue - the major downside is that it becomes much 
harder to quickly respond to high priority tasks. so - one would then have to 
invent multiple queues (reflecting JT internal data structures) of different 
priorities.

an entirely different line of attack maybe to think about this as a JT scale 
out (as opposed to performance) problem and figure out how to have multiple 
JTs. a hierarchical one is easy to think of - there's a master JT and a JT per 
rack perhaps. there is still some similarity with the previous scheme in that 
both these levels of trackers would need multiple internal priority queues. but 
the TT/rack-JT communication would still be high frequency (as today) - in 
which case - schemes that call for increasing heartbeat rate would be entirely 
feasible (since there's one JT per rack). 


> Assign multiple tasks per TaskTracker heartbeat
> -----------------------------------------------
>
>                 Key: HADOOP-3136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3136
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>
> In today's logic of finding a new task, we assign only one task per heartbeat.
> We probably could give the tasktracker multiple tasks subject to the max 
> number of free slots it has - for maps we could assign it data local tasks. 
> We could probably run some logic to decide what to give it if we run out of 
> data local tasks (e.g., tasks from overloaded racks, tasks that have least 
> locality, etc.). In addition to maps, if it has reduce slots free, we could 
> give it reduce task(s) as well. Again for reduces we could probably run some 
> logic to give more tasks to nodes that are closer to nodes running most maps 
> (assuming data generated is proportional to the number of maps). For e.g., if 
> rack1 has 70% of the input splits, and we know that most maps are data/rack 
> local, we try to schedule ~70% of the reducers there.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to