Thanks Meng for the explanation.

I imagine most frameworks do not remember what stuff they filtered much
less figure out how previously filtered stuff  can satisfy new operations.
That sounds complicated!

But I like your example. So a suggestion we could make to frameworks could
be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new
app (they might want to use this even if they aren't suppressed!); and to
use UNSUPPRESS when they are rescheduling old work?

Thoughts?

On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <m...@mesosphere.com> wrote:

> Hi Vinod:
>
> Yeah, `CLEAR_FILTERS` sounds good.
>
> UNSUPPRESS should be used whenever currently suppressed framework wants to
> resume getting offers after a previous SUPPRESS call.
>
> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
> call it whenever the framework wants to clear all the existing filters.
>
> To elaborate it, frameworks decline and accumulate filters when it is
> trying to satisfy a particular set of requirements/constraints to perform
> an operation. Once the operation is done and the next operation comes, if
> the new operation has the same (or strictly more) resource
> requirements/constraints compared to the last one, then it is more
> efficient to KEEP the existing filters instead of getting useless offers
> and rebuild the filters again.
>
> On the other hand, if the requirements/constraints are different (i.e. some
> of the previous requirements could be loosened), then it means the existing
> filter no longer make sense. Then it might be a good idea to clear all the
> existing filters to improve the chance of getting more offers.
>
> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> `REVIVE` call, its usage should be independent of suppression/revival. The
> decision to clear the filters only depends on whether the existing filters
> make sense for the current operation constraints/requirements.
>
> Examples:
> If a framework first launches a task, then wants to launch a replacement
> task (because the first task failed), then it should keep the filters built
> up during the first launch. However, if the framework wants to launch a
> second task with a completely different resource profile, then clearing
> filters might help to get more (otherwise filtered) offers and hence speed
> up the deployment.
>
> -Meng
>
> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vinodk...@apache.org> wrote:
>
> > Hi Meng,
> >
> > What would be the recommendation for framework authors on when to use
> > UNSUPPRESS vs CLEAR_FILTER?
> >
> > Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> >
> > On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <m...@mesosphere.com> wrote:
> >
> >> Hi:
> >>
> >> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
> >> clear_filter in order to decouple the dual-semantics of the current
> revive
> >> call.
> >>
> >> As pointed out in the Mesos framework scalability guide
> >> <
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> >,
> >> utilizing the suppress
> >> <
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> >> call is the key to get your cluster to a large number of frameworks
> >> <
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> >.
> >> In short, when a framework is idling with no intention to launch any
> tasks,
> >> it should suppress to inform the Mesos to stop sending any more offers.
> And
> >> the framework should revive
> >> <
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> >> when new work arrives. This way, the allocator will skip the framework
> when
> >> performing resource allocations. As a result, thorny issues such as
> offer
> >> starvation and resource fragmentation would be greatly mitigated.
> >>
> >> That being said. The suppress/revive calls currently are a little bit
> >> unwieldy due to MESOS-9028
> >> <https://issues.apache.org/jira/browse/MESOS-9028>:
> >>
> >> The revive call has two semantics. It unsuppresses the framework AND
> >> clears all the existing filters. The later makes the revive call
> >> non-idempotent. And sometimes users may want to keep the existing
> filters
> >> when reiving which is not possible atm.
> >>
> >> To decouple the semantics, as suggested in the ticket, we propose to add
> >> two new V1 scheduler calls:
> >>
> >> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> >> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters.
> >>
> >> To make life easier, both calls will return 200 OK (as opposed to 202
> >> returned by most existing scheduler calls, including `SUPPRESS` and
> >> `REVIVE`).
> >>
> >> We will keep the revive call and its semantics (i.e. unsupppress AND
> >> clear filters) for backward compatibility.
> >>
> >> Note, the changes are proposed for V1 API only. Thus, once the changes
> >> are landed, framework developers are encouraged to move to V1 API to
> take
> >> advantage of the new calls (among many other benefits).
> >>
> >> Any feedback/comments are welcome.
> >>
> >> -Meng
> >>
> >
>

Reply via email to