[
https://issues.apache.org/jira/browse/MESOS-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Mahler updated MESOS-4302:
-----------------------------------
Description:
Currently, when the allocator recovers resources from an offer, it creates a
filter timeout based on time at which the call is processed.
This means that if it takes longer than the filter duration for the allocator
to perform an allocation for the relevant agent, then the filter is never
applied.
This leads to pathological behavior: if the framework sets a filter duration
that is smaller than the wall clock time it takes for us to perform the next
allocation, then the filters will have no effect. This can mean that low share
frameworks may continue receiving offers that they have no intent to use,
without other frameworks ever receiving these offers.
The workaround for this is for frameworks to set high filter durations, and
possibly reviving offers when they need more resources, however, we should fix
this issue in the allocator. (i.e. derive the timeout deadlines and expiry
based on allocation times).
This seems to warrant cherry-picking into bug fix releases.
was:
Currently, when the allocator recovers resources from an offer, it creates a
filter timeout based on time at which the call is processed.
This means that if it takes longer than the filter duration for the allocator
to perform an allocation for the relevant agent, then the filter is never
applied.
This leads to pathological behavior: if the framework sets a filter duration
that is smaller than the wall clock time it takes for us to perform the next
allocation, then the filters will have no effect. This can mean that low share
frameworks may continue receiving offers that they have no intent to use,
without other frameworks ever receiving these offers.
The workaround for this is for frameworks to set high filter durations, and
possibly reviving offers when they need more resources, however, we should fix
this issue in the allocator. (i.e. derive the timeout deadlines and expiry
based on allocation times).
This seems to warrant cherry-picking into bug fix releases for future versions.
> Offer filter timeouts are ignored if the allocator is slow or backlogged.
> -------------------------------------------------------------------------
>
> Key: MESOS-4302
> URL: https://issues.apache.org/jira/browse/MESOS-4302
> Project: Mesos
> Issue Type: Bug
> Components: allocation
> Reporter: Benjamin Mahler
> Priority: Critical
> Labels: mesosphere
>
> Currently, when the allocator recovers resources from an offer, it creates a
> filter timeout based on time at which the call is processed.
> This means that if it takes longer than the filter duration for the allocator
> to perform an allocation for the relevant agent, then the filter is never
> applied.
> This leads to pathological behavior: if the framework sets a filter duration
> that is smaller than the wall clock time it takes for us to perform the next
> allocation, then the filters will have no effect. This can mean that low
> share frameworks may continue receiving offers that they have no intent to
> use, without other frameworks ever receiving these offers.
> The workaround for this is for frameworks to set high filter durations, and
> possibly reviving offers when they need more resources, however, we should
> fix this issue in the allocator. (i.e. derive the timeout deadlines and
> expiry based on allocation times).
> This seems to warrant cherry-picking into bug fix releases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)