So this proposal would only affect schedulers using the libmesos scheduler 
driver API? Schedulers using the v1 HTTP would not get any changes in 
behaviour, right?

> On Jun 21, 2019, at 9:56 PM, Andrei Sekretenko <asekrete...@mesosphere.io> 
> wrote:
> 
> Hi all,
> 
> we are intending to change the behavior of the suppressOffers() method of
> MesosSchedulerDriver with regard to the transparent re-registration.
> 
> Currently, when driver becomes disconnected from a master, it performs on
> its own a re-registration with an empty set of suppressed roles. This
> causes un-suppression
> of all the suppressed roles of the framework.
> 
> The plan is to alter this behavior into preserving the suppression state on
> this re-registration.
> 
> The required set of suppressed roles will be stored in the driver, which
> will be now performing re-registration with this set (instead of an empty
> one),
> and updating the stored set whenever a call modifying the suppression state
> of the roles in the allocator is performed.
> Currently, the driver has two methods which perform such calls:
> suppressOffers()  and reviveOffers().
> 
> Please feel free to raise any concerns or objections - especially if you
> are aware of any V0 frameworks which (probably implicitly) depend on
> un-suppression of the roles when this re-registration occurs.
> 
> 
> 
> Note that:
> - Frameworks which do not call suppressOffers() are, obviously, unaffected
> by this change.
> 
> - Frameworks that reliably prevent transparent-re-registration (for
> example, by calling driver.abort() immediately from the disconnected()
> callback), should also be not affected.
> 
> - Storing the suppressed roles list for re-registration and clearing it in
> reviveOffers() do not change anything for the existing frameworks. It is
> setting this list in suppressOffers() which might be a cause of concerns.
> 
> - I'm using the word "un-suppression" because re-registering with roles
> removed from the suppressed roles list is NOT equivalent to performing
> REVIVE call for these roles (unlike REVIVE, it does not clear offerFilters
> in the allocator).
> 
> =====
> A bit of background on why this change is needed.
> 
> To properly support V0 frameworks with large number of roles, it is
> necessary for the driver not to change the suppression state of the roles
> on its own.
> Therefore, due to the existence of the transparent re-registration in the
> driver, we will need to store the required suppression state in the driver
> and make it re-register using this state.
> 
> We could possibly avoid the proposed change of suppressOffers() by adding
> to the driver new interface for changing the suppression state, leaving
> suppressOffers() as it is, and marking it as deprecated.
> 
> However, this will leave the behaviour of suppressOffers() deeply
> inconsistent with everything else.
> Compare the following two sequences of events.
> First one:
> - The framework creates and starts a driver with roles "role1", "role2"...
> "role500", the driver registers
> - The framework calls a new method driver.suppressOffersForRoles({"role1",
> ..., "role500"}), the driver performs SUPPRESS call for these roles and
> stores them in its suppressed roles set.
>   (Alternative with the same result: the framework calls
> driver.updateFramework(FrameworkInfo, suppressedRoles={"role1", ...,
> "role500"}), the driver performs UPDATE_FRAMEWORK call with those
> parameters and stores the new suppressed roles set).
> - The driver, due to some reason, disconnects and re-registers with the
> same master, providing the stored suppressed roles set.
> - All the roles are still suppressed
> Second one:
> - The framework creates and starts a driver with roles "role1", "role2"...
> "role500", the driver registers
> - The framework calls driver.suppressOffers(), the driver performs
> SUPPRESS call for all roles, but doesn't modify required suppression state.
> - The driver, due to some reason, disconnects and re-registers with the
> same master, providing the stored suppressed roles set, which is empty.
> - Now, none of the roles are suppressed, allocator generates offers for
> 500 roles which will likely be declined by the framework.
> 
> This is one of the examples which makes us strongly consider altering the
> interaction between suppressOffers() and the transparent re-registration
> when we add storing the suppression state to the driver.
> 
> Regards,
> Andrei Sekretenko

Reply via email to