[jira] Commented: (HADOOP-3136) Assign multiple tasks per TaskTracker heartbeat

Joydeep Sen Sarma (JIRA) Wed, 30 Jul 2008 12:08:26 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618476#action_12618476
 ]


Joydeep Sen Sarma commented on HADOOP-3136:
-------------------------------------------

hey folks - there are some downsides with what's proposed here:

- it's likely that on an idle cluster with a series of new jobs - each new job 
would end up on a subset of nodes (the first nodes to send heartbeats after job 
arrival). this is generally speaking bad:
  - jobs have different characterists and it's good to spread them around (esp. 
pertinent to memory usage)
  - reduce tasks have bursty network/disk traffic pattern and in general 
scheduling multiple of them on the same node is bad
- map locality could get worse (would we schedule non-local jobs even if a node 
advertises availability of multiple slots? this is a very hard problem to solve)

the idea of dispatching multiple scheduling decisions in one shot is cool. but 
the idea of making multiple scheduling decisions only considering the 
availability of a single node is not that great. the current scheduler limps 
along with this badness since it only makes one decision at a time (so what's 
locally optimal is usually globally optimal - esp. as far as map locality is 
concerned). but that would increasingly not be the case when multiple decisions 
need to be made in one shot.

here's a slightly different - but even easier way of solving this that avoids 
this downside:
- make the heartbeat rate from TT inversely proportional to the number idle 
slots on the TT.
- the exact formula is approx.: send S/I heartbeats per second on behalf of 
idle slots from each TT (where I is the number of idle slots and S is the 
average task completion time (either parameterized, or observed).

The reasoning is as follows:
- in a busy cluster (lots of tasks in and out) - there is no problem with 
delays in dispatching tasks. And we know that scheduling decisions are 
reasonably satisfactory today (or at least that is an issue outside of the 
scope of this Jira).
- in a busy cluster - if the average task completion time is S and total number 
of slots per TT is N - then on average each TT sends S/N heartbeats per second
- so the idea is to make an idle cluster mimic a busy cluster as fat as TT/JT 
communication goes. We do this by pretending that the idle slots are actually 
busy - and thereby sending heartbeats every S/I seconds on behalf of the idle 
slots.

there should not be any impact on JT scalability or overall network traffic. 
the network and JT should be able to bear traffic that would result from the 
cluster being 100% busy (and that's what we are emulating here). in addition - 
where we are adding traffic (cluster idling) - other traffic (DFS/shuffle) 
traffic would be lower.

longer term - we should try to separate scheduling (which should consider 
global resource availability) from dispatching (which is a per node activity). 
But this seems like a much bigger problem.


> Assign multiple tasks per TaskTracker heartbeat
> -----------------------------------------------
>
>                 Key: HADOOP-3136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3136
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>
> In today's logic of finding a new task, we assign only one task per heartbeat.
> We probably could give the tasktracker multiple tasks subject to the max 
> number of free slots it has - for maps we could assign it data local tasks. 
> We could probably run some logic to decide what to give it if we run out of 
> data local tasks (e.g., tasks from overloaded racks, tasks that have least 
> locality, etc.). In addition to maps, if it has reduce slots free, we could 
> give it reduce task(s) as well. Again for reduces we could probably run some 
> logic to give more tasks to nodes that are closer to nodes running most maps 
> (assuming data generated is proportional to the number of maps). For e.g., if 
> rack1 has 70% of the input splits, and we know that most maps are data/rack 
> local, we try to schedule ~70% of the reducers there.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3136) Assign multiple tasks per TaskTracker heartbeat

Reply via email to