Re: New scheduler API proposal: unsuppress and clear_filter

Benjamin Mahler Mon, 10 Dec 2018 17:07:03 -0800

I think we're agreed:

    -There are no schedulers modeling the existing per-agent time-based
filters that mesos is tracking, and we shouldn't go in a direction that
encourages frameworks to try to model and manage these. So, we should be
very careful in considering something like CLEAR_FILTERS. We're probably
also agreed that the current filters aren't so great. :)
    -Letting a scheduler have more explicit control over the offers it gets
(both in shape of the offers and overall quantity of resources) is a good
direction to go in to reduce the inefficiency in the pessimistic offer
model.
    -Combining matchers of model (2) with REVIVE may eliminate the need for
CLEAR_FILTERS. I think once you have global matchers in play, it eliminates
the need for the existing decline filters to involve resource subsets and
we may be able to move new schedulers forward with a better model without
breaking old schedulers.


I don’t think model (1) was understood as intended. Schedulers would not be
expressing limits, they would be expressing a "request" equivalent to “how
much more they want”. The internal effective limit (equal to
allocation+request) is just an implementation detail here that demonstrates
how it fits cleanly into the allocation algorithm. So, if a scheduler needs
to run 10 tasks with [1 cpu, 10GB mem], they would express a request of
[10cpus ,100GB mem] regardless of how much else is already allocated at
that role/scheduler node.

>From a scheduler's perspective the difference between the two models is:

(1) expressing "how much more" you need
(2) expressing an offer "matcher"

So:

(1) covers the middle part of the demand quantity spectrum we currently
have: unsuppressed -> infinite additional demand, suppressed -> 0
additional demand, and now also unsuppressed w/ request of X -> X
additional demand

(2) is a global filtering mechanism to avoid getting offers in an unusable
shape

They both solve inefficiencies we have, and they're complementary: a
"request" could actually consist of (1) and (2), e.g. "I need an additional
10 cpus, 100GB mem, and I want offers to contain [1cpu, 10GB mem]".

I'll schedule a meeting to discuss further. We should also make sure we
come back to the original problem in this thread around REVIVE retries.

On Mon, Dec 10, 2018 at 11:58 AM Benjamin Bannier <
benjamin.bann...@mesosphere.io> wrote:

> Hi Ben et al.,
>
> I'd expect frameworks to *always* know how to accept or decline offers in
> general. More involved frameworks might know how to suppress offers. I
> don't expect that any framework models filters and their associated
> durations in detail (that's why I called them a Mesos implementation
> detail) since there is not much benefit to a framework's primary goal of
> running tasks as quickly as possible.
>
> > I couldn't quite tell how you were imagining this would work, but let me
> spell out the two models that I've been considering, and you can tell me if
> one of these matches what you had in mind or if you had a different model
> in mind:
>
> > (1) "Effective limit" or "give me this much more" ...
>
> This sounds more like an operator-type than a framework-type API to me.
> I'd assume that frameworks would not worry about their total limit the way
> an operator would, but instead care about getting resources to run a
> certain task at a point in time. I could also imagine this being easy to
> use incorrectly as frameworks would likely need to understand their total
> limit when issuing the call which could require state or coordination among
> internal framework components (think: multi-purpose frameworks like
> Marathon or Aurora).
>
> > (2) "Matchers" or "give me things that look like this": when a scheduler
> expresses its "request" for a role, it would act as a "matcher" (opposite
> of filter). When mesos is allocating resources, it only proceeds if
> (requests.matches(resources) && !filters.filtered(resources)). The open
> ended aspect here is what a matcher would consist of. Consider a case where
> a matcher is a resource quantity and multiple are allowed; if any matcher
> matches, the result is a match. This would be equivalent to letting
> frameworks specify their own --min_allocatable_resources for a role (which
> is something that has been considered). The "matchers" could be more
> sophisticated: full resource objects just like filters (but global), full
> resource objects but with quantities for non-scalar resources like ports,
> etc.
>
> I was thinking in this direction, but what you described is more involved
> than what I had in mind as a possible first attempt. I'd expect that
> frameworks currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not
> as a way to manage their filter state tracked in the allocator. Assuming we
> have some way to express resource quantities (i.e., MESOS-9314), we should
> be able to improve on `REVIVE` by providing a `REQUEST_RESOURCES` which
> clears all filters for resource containing the requested resources (or all
> filters if no explicit resource request). Even if that let to more offers
> than needed it would likely still perform better than `REVIVE` (or
> `CLEAR_FILTERS` which has similar semantics). If we keep the scope of these
> calls narrow and clear we have freedom to be smarter in the future
> internally.
>
> This should not only be pretty straight-forward to implement in Mesos, but
> I'd imagine also map pretty well onto framework use cases (i.e., I assume
> frameworks are interested in controlling the resources they are offered,
> not in managing filters we maintain for them).
>
> > With regard to incentives, the incentive today for adhering to suppress
> is that your framework will be doing less processing of offers when it has
> no work to do and that other instances of your own framework as well as
> other frameworks would get resources faster. The second aspect is indeed
> indirect. The incentive structure with "request" / "demand" does indeed
> seem to be more direct (while still having the indirect benefit on other
> frameworks / roles): "I'll tell you what to show me so that I get it
> faster".
>
> Additionally, by potentially explicitly introducing filters as a framework
> API concept, we ask the majority of framework authors to reason about an
> aspect they didn't have to worry about up until then (previously: "if work
> arrives, revive, and decline until an offer can be accepted, then
> suppress"). If we provided them something which fits their *current mental
> model* while also gives them more control, we have a higher chance of it
> being globally useful and adopted than if we'd add an expert-level knob.
>
> > However, as far as performance is concerned, we still need suppress
> adoption and not just request adoption. Suppress is actually the bigger
> performance win at the current time, unless we think that frameworks with
> no work would "effectively suppress" via requests (e.g. "no work? set a 0
> request so nothing matches"). Note though, that "effectively suppressing"
> via requests has the same incentive structure as suppress itself, right?
>
> I was also wondering about how what I suggested would fit here as we have
> two concepts controlling if and which offers a framework gets (a single
> global flag for suppress, and a zoo of many fine-grained filters).
> Currently we only expose `SUPPRESS`, `DECLINE`, and `REVIVE`. It seems that
> explicitly adding framework control over filters to that might restrict
> what we can do internally in the future. Right now the API gives us some
> freedom how we interpret declines, we could e.g., merge filters which
> expire at the same time, or even interpret filters on all cluster resources
> interchangebly with a suppressed state (the API would actually allow us to
> put a framework into suppressed state -- maybe for some time -- even before
> it has seen all resources). If we exposed filters we loose some of that
> implementation freedom, and we should make sure it is worth it.
>
> As for incentives, if we finally added `REQUEST_RESOURCES` we’d allow
> frameworks to make their interaction with Mesos more declarative yet
> conceptually not much harder. Even if we (Mesos) wouldn’t be able to
> implement optimal handling right away, it should could already be useful
> with an MVP implementation on the Mesos side. Also, it would open up
> potential for future optimizations with frameworks already "speaking the
> right protocol".
>
>
>
> Cheers,
>
> Benjamin
>
>

Re: New scheduler API proposal: unsuppress and clear_filter

Reply via email to