[
https://issues.apache.org/jira/browse/MAPREDUCE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027454#comment-13027454
]
Elton Tian commented on MAPREDUCE-1603:
---------------------------------------
I like the idea, but I don't think we can set the slots with hardware
parameters, rather it's application dependant. For example, you have a Quad
core cluster and a Dual core cluster. Both cluster have same disk and inter
connection. When you run a "Grep", if you apply the same slot numbers on both
cluster, I guess the processing times are similar. If you change you
application to "Sort", still using same number of slots, then there could be
noticeable difference.
So I guess, to get a reasonable slots, we need to actually run the application.
Somehow.
> Add a plugin class for the TaskTracker to determine available slots
> -------------------------------------------------------------------
>
> Key: MAPREDUCE-1603
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1603
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: tasktracker
> Affects Versions: 0.22.0
> Reporter: Steve Loughran
> Priority: Minor
>
> Currently the #of available map and reduce slots is determined by the
> configuration. MAPREDUCE-922 has proposed working things out automatically,
> but that is going to depend a lot on the specific tasks -hard to get right
> for everyone.
> There is a Hadoop cluster near me that would like to use CPU time from other
> machines in the room, machines which cannot offer storage, but which will
> have spare CPU time when they aren't running code scheduled with a grid
> scheduler. The nodes could run a TT which would report a dynamic number of
> slots, the number depending upon the current grid workload.
> I propose we add a plugin point here, so that different people can develop
> plugin classes that determine the amount of available slots based on
> workload, RAM, CPU, power budget, thermal parameters, etc. Lots of space for
> customisation and improvement. And by having it as a plugin: people get to
> integrate with whatever datacentre schedulers they have without Hadoop itself
> needing to be altered: the base implementation would be as today: subtract
> the number of active map and reduce slots from the configured values, push
> that out.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira