[jira] Commented: (HADOOP-657) Free temporary space should be modelled better

Arun C Murthy (JIRA) Thu, 16 Nov 2006 09:46:00 -0800

    [ 
http://issues.apache.org/jira/browse/HADOOP-657?page=comments#action_12450470 ] 
            
Arun C Murthy commented on HADOOP-657:
--------------------------------------


I see the value in Doug's suggestion... for e.g. at some point in the future we 
might also put in metrics like CPU load, VM stats etc. and this would let the 
JobTracker make 'smarter' decisions about which task to assign to which 
TaskTrackers i.e. CPU-bound tasks to IO-laden TTs and vice-versa. 

I do agree that it might be a very futuristic scenario, but the point is to 
keep the infrastructure robust when we can...

> Free temporary space should be modelled better
> ----------------------------------------------
>
>                 Key: HADOOP-657
>                 URL: http://issues.apache.org/jira/browse/HADOOP-657
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.7.2
>            Reporter: Owen O'Malley
>         Assigned To: Arun C Murthy
>
> Currently, there is a configurable size that must be free for a task tracker 
> to accept a new task. However, that isn't a very good model of what the task 
> is likely to take. I'd like to propose:
> Map tasks:  totalInputSize * conf.getFloat("map.output.growth.factor", 1.0) / 
> numMaps
> Reduce tasks: totalInputSize * 2 * conf.getFloat("map.output.growth.factor", 
> 1.0) / numReduces
> where totalInputSize is the size of all the maps inputs for the given job.
> To start a new task, 
>   newTaskAllocation + (sum over running tasks of (1.0 - done) * allocation) 
> >= 
>        free disk * conf.getFloat("mapred.max.scratch.allocation", 0.90);
> So in English, we will model the expected sizes of tasks and only task tasks 
> that should leave us a 10% margin. With:
> map.output.growth.factor -- the relative size of the transient data relative 
> to the map inputs
> mapred.max.scratch.allocation -- the maximum amount of our disk we want to 
> allocate to tasks.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-657) Free temporary space should be modelled better

Reply via email to