+1 for leveraging `reqouestResources. I've also toyed this idea with
allocator groups offline. IMO Giving schedulers a way to specify resource
envelope size and/or constraints is a easier way to manage the resources.

On Thu, Jun 16, 2016 at 9:39 AM, Alex Rukletsov <a...@mesosphere.com> wrote:

> We definitely don't want a 2-step scenario. In this case, a framework may
> not be able to launch its tasks on GPU resources, while still holding them.
>
> However, having a dedicated sorter for scarce resources does not mean we
> should allocate them separately. Also, I'm not sure Guangya intended to
> enumerate allocation stages, it looks like he simply listed the sorters. I
> don't see why we may want to allocate scarce resources after allocating
> revocable.
>
> I think a more important question is how to effectively offer scarce
> resources to frameworks that are interested in them. Maybe we can leverage
> `requestResources`?
>
> On Thu, Jun 16, 2016 at 5:22 PM, Hans van den Bogert <hansbog...@gmail.com
> >
> wrote:
>
> > Hi all,
> >
> > Maybe I’m missing context info on how something like a GPU as a resource
> > should work, but I assume that the general scenario would be that the GPU
> > host application would still need memory and cpu(s) co-located on the
> node.
> > In the case of,
> > > 4) scarceSorter include 1 agent with (gpus:1)
> >
> > If I understand your meaning correctly, this round of offers would, in
> > this case, only consists of the GPU resource. Is it then up to the
> > framework to figure out it will also need cpu and memory on the agent’s
> > node? If so, it would need at least another offer for that node to make
> the
> > GPU resource useful. Such a 2-step offer/accept is rather cumbersome.
> >
> > Regards,
> >
> > Hans
> >
> >
> >
> > > On Jun 16, 2016, at 11:26 AM, Guangya Liu <gyliu...@gmail.com> wrote:
> > >
> > > Hi Ben,
> > >
> > > The pre-condition for four stage allocation is that we need to put
> > > different resources to different sorters:
> > >
> > > 1) roleSorter only include non scarce resources.
> > > 2) quotaRoleSorter only include non revocable & non scarce resources.
> > > 3) revocableSorter only include revocable & non scarce resources. This
> > will
> > > be handled in MESOS-4923 <
> > https://issues.apache.org/jira/browse/MESOS-4923>
> > > 4) scarceSorter only include scarce resources.
> > >
> > > Take your case above:
> > > 999 agents with (cpus:4,mem:1024,disk:1024)
> > > 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
> > >
> > > The four sorters would be:
> > > 1) roleSorter include 1000 agents with (cpus:4,mem:1024,disk:1024)
> > > 2) quotaRoleSorter include 1000 agents with (cpus:4,mem:1024,disk:1024)
> > > 3) revocableSorter include nothing as I have no revocable resources
> here.
> > > 4) scarceSorter include 1 agent with (gpus:1)
> > >
> > > When allocate resources, even if a role got the agent with gpu
> resources,
> > > its share will only be counter by scarceSorter but not other sorters,
> and
> > > will not impact other sorters.
> > >
> > > The above solution is actually kind of enhancement to "exclude scarce
> > > resources" as the scarce resources also obey the DRF algorithm with
> this.
> > >
> > > This solution can be also treated as diving the whole resources pool
> > > logically to scarce and non scarce resource pool. 1), 2) and 3) will
> > handle
> > > non scarce resources while 4) focus on scarce resources.
> > >
> > > Thanks,
> > >
> > > Guangya
> > >
> > > On Thu, Jun 16, 2016 at 2:10 AM, Benjamin Mahler <bmah...@apache.org>
> > wrote:
> > >
> > >> Hm.. can you expand on how adding another allocation stage for only
> > scarce
> > >> resources would behave well? It seems to have a number of problems
> when
> > I
> > >> think through it.
> > >>
> > >> On Sat, Jun 11, 2016 at 7:59 AM, Guangya Liu <gyliu...@gmail.com>
> > wrote:
> > >>
> > >>> Hi Ben,
> > >>>
> > >>> For long term goal, instead of creating sub-pool, what about adding a
> > new
> > >>> sorter to handle **scare** resources? The current logic in allocator
> > was
> > >>> divided to two stages: allocation for quota, allocation for non quota
> > >>> resources.
> > >>>
> > >>> I think that the future logic in allocator would be divided to four
> > >>> stages:
> > >>> 1) allocation for quota
> > >>> 2) allocation for reserved resources
> > >>> 3) allocation for revocable resources
> > >>> 4) allocation for scare resources
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Guangy
> > >>>
> > >>> On Sat, Jun 11, 2016 at 10:50 AM, Benjamin Mahler <
> bmah...@apache.org>
> > >>> wrote:
> > >>>
> > >>>> I wanted to start a discussion about the allocation of "scarce"
> > >>>> resources. "Scarce" in this context means resources that are not
> > present on
> > >>>> every machine. GPUs are the first example of a scarce resource that
> we
> > >>>> support as a known resource type.
> > >>>>
> > >>>> Consider the behavior when there are the following agents in a
> > cluster:
> > >>>>
> > >>>> 999 agents with (cpus:4,mem:1024,disk:1024)
> > >>>> 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
> > >>>>
> > >>>> Here there are 1000 machines but only 1 has GPUs. We call GPUs a
> > >>>> "scarce" resource here because they are only present on a small
> > percentage
> > >>>> of the machines.
> > >>>>
> > >>>> We end up with some problematic behavior here with our current
> > >>>> allocation model:
> > >>>>
> > >>>>    (1) If a role wishes to use both GPU and non-GPU resources for
> > >>>> tasks, consuming 1 GPU will lead DRF to consider the role to have a
> > 100%
> > >>>> share of the cluster, since it consumes 100% of the GPUs in the
> > cluster.
> > >>>> This framework will then not receive any other offers.
> > >>>>
> > >>>>    (2) Because we do not have revocation yet, if a framework decides
> > to
> > >>>> consume the non-GPU resources on a GPU machine, it will prevent the
> > GPU
> > >>>> workloads from running!
> > >>>>
> > >>>> --------
> > >>>>
> > >>>> I filed an epic [1] to track this. The plan for the short-term is to
> > >>>> introduce two mechanisms to mitigate these issues:
> > >>>>
> > >>>>    -Introduce a resource fairness exclusion list. This allows the
> > >>>> shares of resources like "gpus" to be excluded from the dominant
> > share.
> > >>>>
> > >>>>    -Introduce a GPU_AWARE framework capability. This indicates that
> > the
> > >>>> scheduler is aware of GPUs and will schedule tasks accordingly. Old
> > >>>> schedulers will not have the capability and will not receive any
> > offers for
> > >>>> GPU machines. If a scheduler has the capability, we'll advise that
> > they
> > >>>> avoid placing their additional non-GPU workloads on the GPU
> machines.
> > >>>>
> > >>>> --------
> > >>>>
> > >>>> Longer term, we'll want a more robust way to manage scarce
> resources.
> > >>>> The first thought we had was to have sub-pools of resources based on
> > >>>> machine profile and perform fair sharing / quota within each pool.
> > This
> > >>>> addresses (1) cleanly, and for (2) the operator needs to explicitly
> > >>>> disallow non-GPU frameworks from participating in the GPU pool.
> > >>>>
> > >>>> Unfortunately, by excluding non-GPU frameworks from the GPU pool we
> > may
> > >>>> have a lower level of utilization. In the even longer term, as we
> add
> > >>>> revocation it will be possible to allow a scheduler desiring GPUs to
> > revoke
> > >>>> the resources allocated to the non-GPU workloads running on the GPU
> > >>>> machines. There are a number of things we need to put in place to
> > support
> > >>>> revocation ([2], [3], [4], etc), so I'm glossing over the details
> > here.
> > >>>>
> > >>>> If anyone has any thoughts or insight in this area, please share!
> > >>>>
> > >>>> Ben
> > >>>>
> > >>>> [1] https://issues.apache.org/jira/browse/MESOS-5377
> > >>>> [2] https://issues.apache.org/jira/browse/MESOS-5524
> > >>>> [3] https://issues.apache.org/jira/browse/MESOS-5527
> > >>>> [4] https://issues.apache.org/jira/browse/MESOS-4392
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
>



-- 
Cheers,

Zhitao Li

Reply via email to