Thanks Meng for the explanation. I imagine most frameworks do not remember what stuff they filtered much less figure out how previously filtered stuff can satisfy new operations. That sounds complicated!
But I like your example. So a suggestion we could make to frameworks could be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new app (they might want to use this even if they aren't suppressed!); and to use UNSUPPRESS when they are rescheduling old work? Thoughts? On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <m...@mesosphere.com> wrote: > Hi Vinod: > > Yeah, `CLEAR_FILTERS` sounds good. > > UNSUPPRESS should be used whenever currently suppressed framework wants to > resume getting offers after a previous SUPPRESS call. > > As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to > call it whenever the framework wants to clear all the existing filters. > > To elaborate it, frameworks decline and accumulate filters when it is > trying to satisfy a particular set of requirements/constraints to perform > an operation. Once the operation is done and the next operation comes, if > the new operation has the same (or strictly more) resource > requirements/constraints compared to the last one, then it is more > efficient to KEEP the existing filters instead of getting useless offers > and rebuild the filters again. > > On the other hand, if the requirements/constraints are different (i.e. some > of the previous requirements could be loosened), then it means the existing > filter no longer make sense. Then it might be a good idea to clear all the > existing filters to improve the chance of getting more offers. > > Note, although we introduce `CLEAR_FILTERS` as part of decoupling the > `REVIVE` call, its usage should be independent of suppression/revival. The > decision to clear the filters only depends on whether the existing filters > make sense for the current operation constraints/requirements. > > Examples: > If a framework first launches a task, then wants to launch a replacement > task (because the first task failed), then it should keep the filters built > up during the first launch. However, if the framework wants to launch a > second task with a completely different resource profile, then clearing > filters might help to get more (otherwise filtered) offers and hence speed > up the deployment. > > -Meng > > On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vinodk...@apache.org> wrote: > > > Hi Meng, > > > > What would be the recommendation for framework authors on when to use > > UNSUPPRESS vs CLEAR_FILTER? > > > > Also, should it CLEAR_FILTERS instead of CLEAR_FILTER? > > > > On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <m...@mesosphere.com> wrote: > > > >> Hi: > >> > >> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and > >> clear_filter in order to decouple the dual-semantics of the current > revive > >> call. > >> > >> As pointed out in the Mesos framework scalability guide > >> < > http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability > >, > >> utilizing the suppress > >> < > http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress> > >> call is the key to get your cluster to a large number of frameworks > >> < > https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf > >. > >> In short, when a framework is idling with no intention to launch any > tasks, > >> it should suppress to inform the Mesos to stop sending any more offers. > And > >> the framework should revive > >> < > http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive> > >> when new work arrives. This way, the allocator will skip the framework > when > >> performing resource allocations. As a result, thorny issues such as > offer > >> starvation and resource fragmentation would be greatly mitigated. > >> > >> That being said. The suppress/revive calls currently are a little bit > >> unwieldy due to MESOS-9028 > >> <https://issues.apache.org/jira/browse/MESOS-9028>: > >> > >> The revive call has two semantics. It unsuppresses the framework AND > >> clears all the existing filters. The later makes the revive call > >> non-idempotent. And sometimes users may want to keep the existing > filters > >> when reiving which is not possible atm. > >> > >> To decouple the semantics, as suggested in the ticket, we propose to add > >> two new V1 scheduler calls: > >> > >> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers; > >> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters. > >> > >> To make life easier, both calls will return 200 OK (as opposed to 202 > >> returned by most existing scheduler calls, including `SUPPRESS` and > >> `REVIVE`). > >> > >> We will keep the revive call and its semantics (i.e. unsupppress AND > >> clear filters) for backward compatibility. > >> > >> Note, the changes are proposed for V1 API only. Thus, once the changes > >> are landed, framework developers are encouraged to move to V1 API to > take > >> advantage of the new calls (among many other benefits). > >> > >> Any feedback/comments are welcome. > >> > >> -Meng > >> > > >