[jira] [Commented] (MAPREDUCE-4256) Improve resource scheduling

Robert Joseph Evans (JIRA) Tue, 29 May 2012 09:53:26 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284926#comment-13284926
 ]


Robert Joseph Evans commented on MAPREDUCE-4256:
------------------------------------------------

@Radim,

I have been trying to think of any resource or resource request that does not 
really fit into this model.  There are a few that I have come up with.  I don't 
know if they should be supported or not.  I am including them here mostly for 
discussion to see what others think.

 # optional resources.  For example I have a mapper that could really be sped 
up by having access to a GPU card, but it can run without it.  So if there is 
one free I would like to use it, but if it isn't available I am OK without it.  
This opens up a bit of a can of worms because how do I indicate in the request 
how much effort would I like to go into giving me access to that resource 
before giving up.
 # a subclass of this might be flexible resources.  I would love to have 10GB 
of RAM, it would really make my process run fast, with all of the caching I 
would like to do, but I can get by with as little as 2GB.
 # data locality/network accessible resources.  I want to be near a given 
resource but I don't necessarily need to be on the same box as the resource.  
This could be used when reading from/writing to a filer, DB server, or even a 
different yet to be scheduled container. This is a bit tricky because we 
already try to do this, but in a different way.  With each container we request 
a node/rack that we would like to run on.  The reality is that at least for map 
reduce we are requesting an HDFS block that we would like to run near, but 
there are too many blocks to effectively put it into a resource model.  Instead 
it was generalized, and would probably work for most things, except for the 
case of I want to be near a yet to be scheduled container.  Even that would 
work, so long as we only launch containers one at a time.
 # global resources.  Things like bandwidth between datacenters for distcp 
copies.  Perhaps these could be modeled more as cluster wide resources. But the 
APIs would have to be set up assuming that the free/max resources can change 
without any input from the RM.  Meaning the scheduler may need to be able to 
deal with asking for those resources and having them now, not be available.
  # default resources.  For backwards compatibility with existing requests we 
may want to have some resources like CPU, that have a default value if not 
explicitly given.
  # tiered or dependent resources.  With network for example if I am doing a 
distcp across colos, I am not only using up network bandwidth for the given box 
that I am on, I am also using up network bandwidth on the rack switch, and the 
bandwidth going between the colos.  I am not a network engineer so if I am 
completely wrong with the example hopefully the general concept still holds.  
Do we want to some how be able to tie different resources together so a request 
for one thing (intra-colo bandwidth) also implies a request for other 
resources?  

I also have a few questions.

How are the plug-ins intended to adapt to different operating environments? 
Determining the amount of free CPU on Windows is very different from doing it 
on Linux.  Would we have different plug-ins for different OS's? Would it be up 
to the end user to configure it all appropriately or would we try to 
auto-detect the environment and adjust automatically?

Do we want to handle enforcement of the resource limits as well?  We currently 
kill off containers that are using more memory than requested (with some 
leeway). What about other things like CPU or network, do we kill the container 
if it goes over, do we try to use the OS to restrict the amount used, do we 
warn the user that they went over, or do we just ignore it and hope everyone is 
a good citizen?
                
> Improve resource scheduling
> ---------------------------
>
>                 Key: MAPREDUCE-4256
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4256
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Radim Kolar
>
> Currently resource manager supports only Memory resource during container 
> allocation.
> I propose following improvements:
> 1. add support for CPU utilization. Node CPU used information can be obtained 
> by ResourceCalculatorPlugin.
> 2. add support for custom resources. In node configuration will be something 
> like:
> name=node.resource.GPU, value=1 (node has 1 GPU).
> If job will need to use GPU for computation, it will add "GPU=1" requirement 
> to its job config and Resource Manager will allocate container on node with 
> GPU available.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4256) Improve resource scheduling

Reply via email to