Re: [GPU] [Allocation] "Scarce" Resource Allocation

Elizabeth Lingg Wed, 22 Jun 2016 09:37:52 -0700

I see. This is good to know for current development work. Thanks for 
clarifying, Guangya and Kevin.


 Elizabeth Lingg


> On Jun 22, 2016, at 3:02 AM, Guangya Liu <gyliu...@gmail.com> wrote:
> 
> Hi Elizabeth,
> 
> Just FYI, there is a JIRA tracing the resource revocation here
> https://issues.apache.org/jira/browse/MESOS-4967
> 
> And I'm also working on the short term solution of excluding the scarce
> resources from allocator (https://reviews.apache.org/r/48906/), with this
> feature and Kevin's GPU_RESOURCES capability, the mesos can handle scarce
> resources well.
> 
> Thanks,
> 
> Guangya
> 
> On Wed, Jun 22, 2016 at 4:45 AM, Kevin Klues <klue...@gmail.com> wrote:
> 
>> As an FYI, preliminary support to work around this issue for GPUs will
>> appear in the 1.0 release
>> https://reviews.apache.org/r/48914/
>> 
>> This doesn't solve the problem of scarce resources in general, but it
>> will at least keep non-GPU workloads from starving out GPU-based
>> workloads on GPU capable machines. The downside of this approach is
>> that only GPU aware frameworks will be able to launch stuff on GPU
>> capable machines (meaning some of their resources could go unused
>> unnecessarily).  We decided this tradeoff is acceptable for now.
>> 
>> Kevin
>> 
>> On Tue, Jun 21, 2016 at 1:40 PM, Elizabeth Lingg
>> <elizabeth_li...@apple.com> wrote:
>>> Thanks, looking forward to discussion and review on your document. The
>> main use case I see here is that some of our frameworks will want to
>> request the GPU resources, and we want to make sure that those frameworks
>> are able to successfully launch tasks on agents with those resources. We
>> want to be certain that other frameworks that do not require GPU’s will not
>> request all other resources on those agents (i.e. cpu, disk, memory) which
>> would mean the GPU resources are not allocated and the frameworks that
>> require them will not receive them. As Ben Mahler mentioned, "(2) Because
>> we do not have revocation yet, if a framework decides to consume the
>> non-GPU resources on a GPU machine, it will prevent the GPU workloads from
>> running!” This will occur for us in clusters where we have higher
>> utilization as well as different types of workloads running. Smart task
>> placement then becomes more relevant (i.e. we want to be able to schedule
>> with scarce resources successfully and we may have considerations like not
>> scheduling too many I/O bound workloads on a single host or more stringent
>> requirements for scheduling persistent tasks).
>>> 
>>>  Elizabeth Lingg
>>> 
>>> 
>>> 
>>>> On Jun 20, 2016, at 7:24 PM, Guangya Liu <gyliu...@gmail.com> wrote:
>>>> 
>>>> Had some discussion with Ben M, for the following two solutions:
>>>> 
>>>> 1) Ben M: Create sub-pools of resources based on machine profile and
>>>> perform fair sharing / quota within each pool plus a framework
>>>> capability GPU_AWARE
>>>> to enable allocator filter out scarce resources for some frameworks.
>>>> 2) Guangya: Adding new sorters for non scarce resources plus a framework
>>>> capability GPU_AWARE to enable allocator filter out scarce resources for
>>>> some frameworks.
>>>> 
>>>> Both of the above two solutions are meaning same thing and there is no
>>>> difference between those two solutions: Create sub-pools of resources
>> will
>>>> need to introduce different sorters for each sub-pools, so I will merge
>>>> those two solutions to one.
>>>> 
>>>> Also had some dicsussion with Ben for AlexR's solution of implementing
>>>> "requestResource", this API should be treated as an improvement to the
>>>> issues of doing resource allocation pessimistically. (e.g. we
>> offer/decline
>>>> the GPUs to 1000 frameworks before offering it to the GPU framework that
>>>> wants it). And the "requestResource" is providing *more information* to
>>>> mesos. Namely, it gives us awareness of demand.
>>>> 
>>>> Even though for some cases, we can use the "requestResource" to get all
>> of
>>>> the scarce resources, and then once those scarce resources are in use,
>> then
>>>> the WDRF sorter will sorter non scarce resources as normal, but the
>> problem
>>>> is that we cannot guarantee that the framework which have
>> "requestResource"
>>>> can always consume all of the scarce resources before those scarce
>> resource
>>>> allocated to other frameworks.
>>>> 
>>>> I'm planning to draft a document based on solution 1) "Create sub-pools"
>>>> for the long term solution, any comments are welcome!
>>>> 
>>>> Thanks,
>>>> 
>>>> Guangya
>>>> 
>>>> On Sat, Jun 18, 2016 at 11:58 AM, Guangya Liu <gyliu...@gmail.com>
>> wrote:
>>>> 
>>>>> Thanks Du Fan. So you mean that we should have some clear rules in
>>>>> document or somewhere else to tell or guide cluster admin which
>> resources
>>>>> should be classified as scarce resources, right?
>>>>> 
>>>>> On Sat, Jun 18, 2016 at 2:38 AM, Du, Fan <fan...@intel.com> wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 2016/6/17 7:57, Guangya Liu wrote:
>>>>>> 
>>>>>>> @Fan Du,
>>>>>>> 
>>>>>>> Currently, I think that the scarce resources should be defined by
>> cluster
>>>>>>> admin, s/he can specify those scarce resources via a flag when master
>>>>>>> start
>>>>>>> up.
>>>>>>> 
>>>>>> 
>>>>>> This is not what I mean.
>>>>>> IMO, it's not cluster admin's call to decide what resources should be
>>>>>> marked as scarce , they can carry out the operation, but should be
>> advised
>>>>>> on based on the clear rule: to what extend the resource is scarce
>> compared
>>>>>> with other resources, and it will affect wDRF by causing starvation
>> for
>>>>>> frameworks which holds scarce resources, that's my point.
>>>>>> 
>>>>>> To my best knowledge here, a quantitative study of how wDRF behaves in
>>>>>> scenario of one/multiple scarce resources first will help to verify
>> the
>>>>>> proposed approach, and guide the user of this functionality.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Regarding to the proposal of generic scarce resources, do you have any
>>>>>>> thoughts on this? I can see that giving framework developers the
>> options
>>>>>>> of
>>>>>>> define scarce resources may bring trouble to mesos, it is better to
>> let
>>>>>>> mesos define those scarce but not framework developer.
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> ~Kevin
>>

Re: [GPU] [Allocation] "Scarce" Resource Allocation

Reply via email to