[
https://issues.apache.org/jira/browse/MESOS-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088488#comment-15088488
]
Guangya Liu commented on MESOS-4302:
------------------------------------
I have some draft idea for this as following
(https://reviews.apache.org/r/42028/ have some problem for unit test), if the
filter duration time for recover resources is less than allocation interval,
then set the filter duration time to the allocation interval with a INFO level
message telling end user what allocator is doing now. [~kaysoky] [~alexr] what
do you think? Thanks.
{code}
if (seconds.get() != Duration::zero()) {
Duration filterTimeOut = seconds.get();
if (filterTimeOut < allocationInterval) {
filterTimeOut = allocationInterval;
LOG(INFO) << "Framework " << frameworkId
<< " filtered slave " << slaveId
<< " for " << seconds.get()
<< " which is less than allocationInterval "
<< allocationInterval
<< ", using allocationInterval "
<< allocationInterval
<< " instead to make sure the recovered resources can"
<< " be aggregated for at least one allocation cycle.";
} else {
>> VLOG(1) << "Framework " << frameworkId
>> << " filtered slave " << slaveId
<< " for " << filterTimeOut;
}
{code}
> Offer filter timeouts are ignored if the allocator is slow or backlogged.
> -------------------------------------------------------------------------
>
> Key: MESOS-4302
> URL: https://issues.apache.org/jira/browse/MESOS-4302
> Project: Mesos
> Issue Type: Bug
> Components: allocation
> Reporter: Benjamin Mahler
> Assignee: Alexander Rukletsov
> Priority: Critical
> Labels: mesosphere
>
> Currently, when the allocator recovers resources from an offer, it creates a
> filter timeout based on time at which the call is processed.
> This means that if it takes longer than the filter duration for the allocator
> to perform an allocation for the relevant agent, then the filter is never
> applied.
> This leads to pathological behavior: if the framework sets a filter duration
> that is smaller than the wall clock time it takes for us to perform the next
> allocation, then the filters will have no effect. This can mean that low
> share frameworks may continue receiving offers that they have no intent to
> use, without other frameworks ever receiving these offers.
> The workaround for this is for frameworks to set high filter durations, and
> possibly reviving offers when they need more resources, however, we should
> fix this issue in the allocator. (i.e. derive the timeout deadlines and
> expiry based on allocation times).
> This seems to warrant cherry-picking into bug fix releases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)