With this 4th sorter approach, how does quota work for scarce resources?

—
*Joris Van Remoortere*
Mesosphere

On Thu, Jun 16, 2016 at 11:26 AM, Guangya Liu <gyliu...@gmail.com> wrote:

> Hi Ben,
>
> The pre-condition for four stage allocation is that we need to put
> different resources to different sorters:
>
> 1) roleSorter only include non scarce resources.
> 2) quotaRoleSorter only include non revocable & non scarce resources.
> 3) revocableSorter only include revocable & non scarce resources. This will
> be handled in MESOS-4923 <https://issues.apache.org/jira/browse/MESOS-4923
> >
> 4) scarceSorter only include scarce resources.
>
> Take your case above:
> 999 agents with (cpus:4,mem:1024,disk:1024)
> 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
>
> The four sorters would be:
> 1) roleSorter include 1000 agents with (cpus:4,mem:1024,disk:1024)
> 2) quotaRoleSorter include 1000 agents with (cpus:4,mem:1024,disk:1024)
> 3) revocableSorter include nothing as I have no revocable resources here.
> 4) scarceSorter include 1 agent with (gpus:1)
>
> When allocate resources, even if a role got the agent with gpu resources,
> its share will only be counter by scarceSorter but not other sorters, and
> will not impact other sorters.
>
> The above solution is actually kind of enhancement to "exclude scarce
> resources" as the scarce resources also obey the DRF algorithm with this.
>
> This solution can be also treated as diving the whole resources pool
> logically to scarce and non scarce resource pool. 1), 2) and 3) will handle
> non scarce resources while 4) focus on scarce resources.
>
> Thanks,
>
> Guangya
>
> On Thu, Jun 16, 2016 at 2:10 AM, Benjamin Mahler <bmah...@apache.org>
> wrote:
>
> > Hm.. can you expand on how adding another allocation stage for only
> scarce
> > resources would behave well? It seems to have a number of problems when I
> > think through it.
> >
> > On Sat, Jun 11, 2016 at 7:59 AM, Guangya Liu <gyliu...@gmail.com> wrote:
> >
> >> Hi Ben,
> >>
> >> For long term goal, instead of creating sub-pool, what about adding a
> new
> >> sorter to handle **scare** resources? The current logic in allocator was
> >> divided to two stages: allocation for quota, allocation for non quota
> >> resources.
> >>
> >> I think that the future logic in allocator would be divided to four
> >> stages:
> >> 1) allocation for quota
> >> 2) allocation for reserved resources
> >> 3) allocation for revocable resources
> >> 4) allocation for scare resources
> >>
> >> Thanks,
> >>
> >> Guangy
> >>
> >> On Sat, Jun 11, 2016 at 10:50 AM, Benjamin Mahler <bmah...@apache.org>
> >> wrote:
> >>
> >>> I wanted to start a discussion about the allocation of "scarce"
> >>> resources. "Scarce" in this context means resources that are not
> present on
> >>> every machine. GPUs are the first example of a scarce resource that we
> >>> support as a known resource type.
> >>>
> >>> Consider the behavior when there are the following agents in a cluster:
> >>>
> >>> 999 agents with (cpus:4,mem:1024,disk:1024)
> >>> 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
> >>>
> >>> Here there are 1000 machines but only 1 has GPUs. We call GPUs a
> >>> "scarce" resource here because they are only present on a small
> percentage
> >>> of the machines.
> >>>
> >>> We end up with some problematic behavior here with our current
> >>> allocation model:
> >>>
> >>>     (1) If a role wishes to use both GPU and non-GPU resources for
> >>> tasks, consuming 1 GPU will lead DRF to consider the role to have a
> 100%
> >>> share of the cluster, since it consumes 100% of the GPUs in the
> cluster.
> >>> This framework will then not receive any other offers.
> >>>
> >>>     (2) Because we do not have revocation yet, if a framework decides
> to
> >>> consume the non-GPU resources on a GPU machine, it will prevent the GPU
> >>> workloads from running!
> >>>
> >>> --------
> >>>
> >>> I filed an epic [1] to track this. The plan for the short-term is to
> >>> introduce two mechanisms to mitigate these issues:
> >>>
> >>>     -Introduce a resource fairness exclusion list. This allows the
> >>> shares of resources like "gpus" to be excluded from the dominant share.
> >>>
> >>>     -Introduce a GPU_AWARE framework capability. This indicates that
> the
> >>> scheduler is aware of GPUs and will schedule tasks accordingly. Old
> >>> schedulers will not have the capability and will not receive any
> offers for
> >>> GPU machines. If a scheduler has the capability, we'll advise that they
> >>> avoid placing their additional non-GPU workloads on the GPU machines.
> >>>
> >>> --------
> >>>
> >>> Longer term, we'll want a more robust way to manage scarce resources.
> >>> The first thought we had was to have sub-pools of resources based on
> >>> machine profile and perform fair sharing / quota within each pool. This
> >>> addresses (1) cleanly, and for (2) the operator needs to explicitly
> >>> disallow non-GPU frameworks from participating in the GPU pool.
> >>>
> >>> Unfortunately, by excluding non-GPU frameworks from the GPU pool we may
> >>> have a lower level of utilization. In the even longer term, as we add
> >>> revocation it will be possible to allow a scheduler desiring GPUs to
> revoke
> >>> the resources allocated to the non-GPU workloads running on the GPU
> >>> machines. There are a number of things we need to put in place to
> support
> >>> revocation ([2], [3], [4], etc), so I'm glossing over the details here.
> >>>
> >>> If anyone has any thoughts or insight in this area, please share!
> >>>
> >>> Ben
> >>>
> >>> [1] https://issues.apache.org/jira/browse/MESOS-5377
> >>> [2] https://issues.apache.org/jira/browse/MESOS-5524
> >>> [3] https://issues.apache.org/jira/browse/MESOS-5527
> >>> [4] https://issues.apache.org/jira/browse/MESOS-4392
> >>>
> >>
> >>
> >
>

Reply via email to