[
https://issues.apache.org/jira/browse/MESOS-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joerg Schad updated MESOS-3202:
-------------------------------
Description:
We currently run into issues with the DRF scheduler that frameworks do not
receive offers (see https://github.com/mesosphere/marathon/issues/1931 for
details).
Imagine that we have 10 frameworks and unallocated resources from a single
slave.
Allocation interval is 1 sec, and refuse_seconds (i.e. the time for which a
declined resource is filtered) is 3 sec across all frameworks.
Allocator offers resources to framework 1 (according to DRF) which declines the
offer immediately.
In the next allocation interval framework 1 is skipped due to the declined
offer before. Hence the next framework 2 is offered the resources, which it
also declines.
The same procedure in the next allocation interval (with framework 3).
In the next allocation interval the refuse_seconds for framework 1 are over,
and as it still has the lowest DRF share it gets the resource offered again,
which it again declines. And the cycle begins again....
Framework 4 (which is actually waiting for this resource) is never offered this
resource.
was:We currently run into issues with the DRF scheduler that frameworks do
not receive offers (see https://github.com/mesosphere/marathon/issues/1931 for
details).
> Avoid frameworks starving in DRF allocator.
> -------------------------------------------
>
> Key: MESOS-3202
> URL: https://issues.apache.org/jira/browse/MESOS-3202
> Project: Mesos
> Issue Type: Bug
> Reporter: Joerg Schad
>
> We currently run into issues with the DRF scheduler that frameworks do not
> receive offers (see https://github.com/mesosphere/marathon/issues/1931 for
> details).
> Imagine that we have 10 frameworks and unallocated resources from a single
> slave.
> Allocation interval is 1 sec, and refuse_seconds (i.e. the time for which a
> declined resource is filtered) is 3 sec across all frameworks.
> Allocator offers resources to framework 1 (according to DRF) which declines
> the offer immediately.
> In the next allocation interval framework 1 is skipped due to the declined
> offer before. Hence the next framework 2 is offered the resources, which it
> also declines.
> The same procedure in the next allocation interval (with framework 3).
> In the next allocation interval the refuse_seconds for framework 1 are over,
> and as it still has the lowest DRF share it gets the resource offered again,
> which it again declines. And the cycle begins again....
> Framework 4 (which is actually waiting for this resource) is never offered
> this resource.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)