[ 
https://issues.apache.org/jira/browse/MESOS-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088488#comment-15088488
 ] 

Guangya Liu commented on MESOS-4302:
------------------------------------

I have some draft idea for this as following 
(https://reviews.apache.org/r/42028/ have some problem for unit test), if the 
filter duration time for recover resources is less than allocation interval, 
then set the  filter duration time to the allocation interval with a INFO level 
message telling end user what allocator is doing now. [~kaysoky] [~alexr] what 
do you think? Thanks.

{code}
 if (seconds.get() != Duration::zero()) {
    Duration filterTimeOut = seconds.get();
    if (filterTimeOut < allocationInterval) {
      filterTimeOut = allocationInterval;
      LOG(INFO) << "Framework " << frameworkId
                << " filtered slave " << slaveId
                << " for " << seconds.get()
                << " which is less than allocationInterval "
                << allocationInterval
                << ", using allocationInterval "
                << allocationInterval
                << " instead to make sure the recovered resources can"
                << " be aggregated for at least one allocation cycle.";
    } else {
>>    VLOG(1) << "Framework " << frameworkId
>>            << " filtered slave " << slaveId
              << " for " << filterTimeOut;
    }
{code}

> Offer filter timeouts are ignored if the allocator is slow or backlogged.
> -------------------------------------------------------------------------
>
>                 Key: MESOS-4302
>                 URL: https://issues.apache.org/jira/browse/MESOS-4302
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>            Reporter: Benjamin Mahler
>            Assignee: Alexander Rukletsov
>            Priority: Critical
>              Labels: mesosphere
>
> Currently, when the allocator recovers resources from an offer, it creates a 
> filter timeout based on time at which the call is processed.
> This means that if it takes longer than the filter duration for the allocator 
> to perform an allocation for the relevant agent, then the filter is never 
> applied.
> This leads to pathological behavior: if the framework sets a filter duration 
> that is smaller than the wall clock time it takes for us to perform the next 
> allocation, then the filters will have no effect. This can mean that low 
> share frameworks may continue receiving offers that they have no intent to 
> use, without other frameworks ever receiving these offers.
> The workaround for this is for frameworks to set high filter durations, and 
> possibly reviving offers when they need more resources, however, we should 
> fix this issue in the allocator. (i.e. derive the timeout deadlines and 
> expiry based on allocation times).
> This seems to warrant cherry-picking into bug fix releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to