[
https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292933#comment-13292933
]
Andrew Ferguson commented on MAPREDUCE-4327:
--------------------------------------------
Hi Arun,
I'm excited to see this started -- I'm quite interested in the multi-resource
scheduling problem. After reading through the patch, I have a few questions for
you; hopefully this feedback will be helpful.
First off, I want to confirm my understanding is correct: this patch is
designed to allocate resources to jobs within the same capacity queue based on
the DRF-inspired ordering of their need for resources. It is not designed to do
weighted DRF for the complete cluster. If I'm mistaken, perhaps some of my
feedback my not apply.
1) Are you planning to change the definition of a queue's capacity? Currently,
it is defined as a fractional percentage of the parent queue's total memory.
Alternatively, queues could be specified with a fractional percentage of each
resource. eg, I could have one queue with "75% CPU and 50% RAM" and a second
with "25% CPU and 50% RAM".
2) Do you plan to change how spare capacity is allocated? My understanding is
that it's currently shared proportionally, based on the queue capacities, an
approach seems like it would be intuitive for cluster operators. With a
multi-resource setup however, running DRF on the pool of spare resources would
provide higher utilization. (I can provide an example of this if you'd like.)
3) Are you planning to support priorities or weights within the queues? IIRC,
this was supported in the MR1 scheduler, and the DRF paper describes a weighted
extension.
4) Lastly, with the increasing flexibility of the YARN scheduler, I think it
makes sense to better support heterogenous clusters. Currently,
yarn.nodemanager.resource.memory-mb is a constant across the cluster, but with
a scheduler capable of packing differently shaped resource containers onto each
node, heterogenous nodes would be a natural extension. (This is more of an
observation than a question. :-)
Looking forward to further discussions.
cheers,
Andrew
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
> Key: MAPREDUCE-4327
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: mrv2, resourcemanager, scheduler
> Affects Versions: 2.0.0-alpha
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
> Attachments: MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several
> applications (MPI et al) to specify not just memory but also CPU (cores) for
> their resource requirements. Thus, it would be useful to the
> CapacityScheduler to account for both.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira