Just to add some color to this, picky scheduling has been a long standing
issue with the two level scheduling architecture of mesos. Given that mesos
does not have enough information from schedulers to be able to pick offers
that the scheduler wants, it can take a very long time to receive a usable
offer.

In the past couple of years, we shipped some great improvements to improve
the allocator's performance, as well as prevent offer starvation. Despite
this, it's still the case that if you have picky apps (e.g. constraints
limit the app to only run on 1 agent in the cluster), it can take a very
long time to finally receive the right offer. The adoption of quota has
exacerbated the issue, because it limits the amount of offers you can
receive concurrently (in the pathological case where you have a small
amount of quota consumption left, you can only receive 1 offer at a time).

We've previously stated that we would solve this by implementing a new
"optimistic" offer model that employs optimistic concurrency control
(providing an equivalent to what was explained in Google's omega paper).
However, we found that adding constraints-based offer filtering will solve
the picky scheduling issue for the use cases we've seen, and is much easier
to implement for both mesos and schedulers. Mostly what we see is
schedulers having very picky apps based on some form of constraint (e.g.
marathon's constraint language, or the presence of a specific resource
reservation).

So, this will be a huge improvement in the next Mesos release for any users
that use reservations or scheduling constraints to limit where their apps
are deployed.

Please let us know if you have any questions! And thanks Andrei for the
hard work on flushing out the design details.

Ben

On Tue, Jul 28, 2020 at 9:21 AM Andrei Sekretenko <asekrete...@d2iq.com>
wrote:

> Hi all,
> Recently, I and my colleagues have been designing a mechanism in Mesos that
> will allow a framework to put constraints on the contents of the offers it
> receives: on the attributes of offered agents, and, as a next step, on
> resources in the offers, so that the framework is more likely to receive an
> offer it really needs.
> The primary aim of this design is to help "picky" frameworks running in
> presence of quota reduce scheduling latency.
>
> I've distilled the implementation proposal on the Mesos side into a design
> doc draft:
>
> https://docs.google.com/document/d/1MV048BwjLSoa8sn_5hs4kIH4YJMf6-Gsqbij3YuT1No
> <
> https://docs.google.com/document/d/1MV048BwjLSoa8sn_5hs4kIH4YJMf6-Gsqbij3YuT1No/edit#heading=h.wq9atl6k4yq0
> >
> /edit#heading=h.wq9atl6k4yq0
> <
> https://docs.google.com/document/d/1MV048BwjLSoa8sn_5hs4kIH4YJMf6-Gsqbij3YuT1No/edit#heading=h.wq9atl6k4yq0
> >
>
> --
> Best regards,
> Andrei Sekretenko
>

Reply via email to