[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027454#comment-13027454
 ] 

Elton Tian commented on MAPREDUCE-1603:
---------------------------------------

I like the idea, but I don't think we can set the slots with hardware 
parameters, rather it's application dependant. For example, you have a Quad 
core cluster and a Dual core cluster. Both cluster have same disk and inter 
connection. When you run a "Grep", if you apply the same slot numbers on both 
cluster, I guess the processing times are similar. If you change you 
application to "Sort", still using same number of slots, then there could be 
noticeable difference. 

So I guess, to get a reasonable slots, we need to actually run the application. 
Somehow.

> Add a plugin class for the TaskTracker to determine available slots
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1603
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1603
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> Currently the #of available map and reduce slots is determined by the 
> configuration. MAPREDUCE-922 has proposed working things out automatically, 
> but that is going to depend a lot on the specific tasks -hard to get right 
> for everyone.
> There is a Hadoop cluster near me that would like to use CPU time from other 
> machines in the room, machines which cannot offer storage, but which will 
> have spare CPU time when they aren't running code scheduled with a grid 
> scheduler. The nodes could run a TT which would report a dynamic number of 
> slots, the number depending upon the current grid workload. 
> I propose we add a plugin point here, so that different people can develop 
> plugin classes that determine the amount of available slots based on 
> workload, RAM, CPU, power budget, thermal parameters, etc. Lots of space for 
> customisation and improvement. And by having it as a plugin: people get to 
> integrate with whatever datacentre schedulers they have without Hadoop itself 
> needing to be altered: the base implementation would be as today: subtract 
> the number of active map and reduce slots from the configured values, push 
> that out. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to