Re: [GPU] [Allocation] "Scarce" Resource Allocation

Elizabeth Lingg Tue, 21 Jun 2016 13:40:42 -0700

Thanks, looking forward to discussion and review on your document. The main use 
case I see here is that some of our frameworks will want to request the GPU 
resources, and we want to make sure that those frameworks are able to 
successfully launch tasks on agents with those resources. We want to be certain 
that other frameworks that do not require GPU’s will not request all other 
resources on those agents (i.e. cpu, disk, memory) which would mean the GPU 
resources are not allocated and the frameworks that require them will not 
receive them. As Ben Mahler mentioned, "(2) Because we do not have revocation 
yet, if a framework decides to consume the non-GPU resources on a GPU machine, 
it will prevent the GPU workloads from running!” This will occur for us in 
clusters where we have higher utilization as well as different types of 
workloads running. Smart task placement then becomes more relevant (i.e. we 
want to be able to schedule with scarce resources successfully and we may have 
considerations like not scheduling too many I/O bound workloads on a single 
host or more stringent requirements for scheduling persistent tasks).


 Elizabeth Lingg



> On Jun 20, 2016, at 7:24 PM, Guangya Liu <gyliu...@gmail.com> wrote:
> 
> Had some discussion with Ben M, for the following two solutions:
> 
> 1) Ben M: Create sub-pools of resources based on machine profile and
> perform fair sharing / quota within each pool plus a framework
> capability GPU_AWARE
> to enable allocator filter out scarce resources for some frameworks.
> 2) Guangya: Adding new sorters for non scarce resources plus a framework
> capability GPU_AWARE to enable allocator filter out scarce resources for
> some frameworks.
> 
> Both of the above two solutions are meaning same thing and there is no
> difference between those two solutions: Create sub-pools of resources will
> need to introduce different sorters for each sub-pools, so I will merge
> those two solutions to one.
> 
> Also had some dicsussion with Ben for AlexR's solution of implementing
> "requestResource", this API should be treated as an improvement to the
> issues of doing resource allocation pessimistically. (e.g. we offer/decline
> the GPUs to 1000 frameworks before offering it to the GPU framework that
> wants it). And the "requestResource" is providing *more information* to
> mesos. Namely, it gives us awareness of demand.
> 
> Even though for some cases, we can use the "requestResource" to get all of
> the scarce resources, and then once those scarce resources are in use, then
> the WDRF sorter will sorter non scarce resources as normal, but the problem
> is that we cannot guarantee that the framework which have "requestResource"
> can always consume all of the scarce resources before those scarce resource
> allocated to other frameworks.
> 
> I'm planning to draft a document based on solution 1) "Create sub-pools"
> for the long term solution, any comments are welcome!
> 
> Thanks,
> 
> Guangya
> 
> On Sat, Jun 18, 2016 at 11:58 AM, Guangya Liu <gyliu...@gmail.com> wrote:
> 
>> Thanks Du Fan. So you mean that we should have some clear rules in
>> document or somewhere else to tell or guide cluster admin which resources
>> should be classified as scarce resources, right?
>> 
>> On Sat, Jun 18, 2016 at 2:38 AM, Du, Fan <fan...@intel.com> wrote:
>> 
>>> 
>>> 
>>> On 2016/6/17 7:57, Guangya Liu wrote:
>>> 
>>>> @Fan Du,
>>>> 
>>>> Currently, I think that the scarce resources should be defined by cluster
>>>> admin, s/he can specify those scarce resources via a flag when master
>>>> start
>>>> up.
>>>> 
>>> 
>>> This is not what I mean.
>>> IMO, it's not cluster admin's call to decide what resources should be
>>> marked as scarce , they can carry out the operation, but should be advised
>>> on based on the clear rule: to what extend the resource is scarce compared
>>> with other resources, and it will affect wDRF by causing starvation for
>>> frameworks which holds scarce resources, that's my point.
>>> 
>>> To my best knowledge here, a quantitative study of how wDRF behaves in
>>> scenario of one/multiple scarce resources first will help to verify the
>>> proposed approach, and guide the user of this functionality.
>>> 
>>> 
>>> 
>>> Regarding to the proposal of generic scarce resources, do you have any
>>>> thoughts on this? I can see that giving framework developers the options
>>>> of
>>>> define scarce resources may bring trouble to mesos, it is better to let
>>>> mesos define those scarce but not framework developer.
>>>> 
>>> 
>>

Re: [GPU] [Allocation] "Scarce" Resource Allocation

Reply via email to