[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292933#comment-13292933
 ] 

Andrew Ferguson commented on MAPREDUCE-4327:
--------------------------------------------

Hi Arun,

I'm excited to see this started -- I'm quite interested in the multi-resource 
scheduling problem. After reading through the patch, I have a few questions for 
you; hopefully this feedback will be helpful.

First off, I want to confirm my understanding is correct: this patch is 
designed to allocate resources to jobs within the same capacity queue based on 
the DRF-inspired ordering of their need for resources. It is not designed to do 
weighted DRF for the complete cluster. If I'm mistaken, perhaps some of my 
feedback my not apply.

1) Are you planning to change the definition of a queue's capacity? Currently, 
it is defined as a fractional percentage of the parent queue's total memory. 
Alternatively, queues could be specified with a fractional percentage of each 
resource. eg, I could have one queue with "75% CPU and 50% RAM" and a second 
with "25% CPU and 50% RAM".

2) Do you plan to change how spare capacity is allocated? My understanding is 
that it's currently shared proportionally, based on the queue capacities, an 
approach seems like it would be intuitive for cluster operators. With a 
multi-resource setup however, running DRF on the pool of spare resources would 
provide higher utilization. (I can provide an example of this if you'd like.)

3) Are you planning to support priorities or weights within the queues? IIRC, 
this was supported in the MR1 scheduler, and the DRF paper describes a weighted 
extension.

4) Lastly, with the increasing flexibility of the YARN scheduler, I think it 
makes sense to better support heterogenous clusters. Currently, 
yarn.nodemanager.resource.memory-mb is a constant across the cluster, but with 
a scheduler capable of packing differently shaped resource containers onto each 
node, heterogenous nodes would be a natural extension. (This is more of an 
observation than a question. :-)


Looking forward to further discussions.

cheers,
Andrew


                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several 
> applications (MPI et al) to specify not just memory but also CPU (cores) for 
> their resource requirements. Thus, it would be useful to the 
> CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to