[ http://issues.apache.org/jira/browse/HADOOP-657?page=comments#action_12450470 ] Arun C Murthy commented on HADOOP-657: --------------------------------------
I see the value in Doug's suggestion... for e.g. at some point in the future we might also put in metrics like CPU load, VM stats etc. and this would let the JobTracker make 'smarter' decisions about which task to assign to which TaskTrackers i.e. CPU-bound tasks to IO-laden TTs and vice-versa. I do agree that it might be a very futuristic scenario, but the point is to keep the infrastructure robust when we can... > Free temporary space should be modelled better > ---------------------------------------------- > > Key: HADOOP-657 > URL: http://issues.apache.org/jira/browse/HADOOP-657 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.7.2 > Reporter: Owen O'Malley > Assigned To: Arun C Murthy > > Currently, there is a configurable size that must be free for a task tracker > to accept a new task. However, that isn't a very good model of what the task > is likely to take. I'd like to propose: > Map tasks: totalInputSize * conf.getFloat("map.output.growth.factor", 1.0) / > numMaps > Reduce tasks: totalInputSize * 2 * conf.getFloat("map.output.growth.factor", > 1.0) / numReduces > where totalInputSize is the size of all the maps inputs for the given job. > To start a new task, > newTaskAllocation + (sum over running tasks of (1.0 - done) * allocation) > >= > free disk * conf.getFloat("mapred.max.scratch.allocation", 0.90); > So in English, we will model the expected sizes of tasks and only task tasks > that should leave us a 10% margin. With: > map.output.growth.factor -- the relative size of the transient data relative > to the map inputs > mapred.max.scratch.allocation -- the maximum amount of our disk we want to > allocate to tasks. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira