James, yes that's correct. On Sat, Jun 22, 2019 at 12:05 AM James Peach <jpe...@apache.org> wrote:
> So this proposal would only affect schedulers using the libmesos scheduler > driver API? Schedulers using the v1 HTTP would not get any changes in > behaviour, right? > > > On Jun 21, 2019, at 9:56 PM, Andrei Sekretenko < > asekrete...@mesosphere.io> wrote: > > > > Hi all, > > > > we are intending to change the behavior of the suppressOffers() method of > > MesosSchedulerDriver with regard to the transparent re-registration. > > > > Currently, when driver becomes disconnected from a master, it performs on > > its own a re-registration with an empty set of suppressed roles. This > > causes un-suppression > > of all the suppressed roles of the framework. > > > > The plan is to alter this behavior into preserving the suppression state > on > > this re-registration. > > > > The required set of suppressed roles will be stored in the driver, which > > will be now performing re-registration with this set (instead of an empty > > one), > > and updating the stored set whenever a call modifying the suppression > state > > of the roles in the allocator is performed. > > Currently, the driver has two methods which perform such calls: > > suppressOffers() and reviveOffers(). > > > > Please feel free to raise any concerns or objections - especially if you > > are aware of any V0 frameworks which (probably implicitly) depend on > > un-suppression of the roles when this re-registration occurs. > > > > > > > > Note that: > > - Frameworks which do not call suppressOffers() are, obviously, > unaffected > > by this change. > > > > - Frameworks that reliably prevent transparent-re-registration (for > > example, by calling driver.abort() immediately from the disconnected() > > callback), should also be not affected. > > > > - Storing the suppressed roles list for re-registration and clearing it > in > > reviveOffers() do not change anything for the existing frameworks. It is > > setting this list in suppressOffers() which might be a cause of concerns. > > > > - I'm using the word "un-suppression" because re-registering with roles > > removed from the suppressed roles list is NOT equivalent to performing > > REVIVE call for these roles (unlike REVIVE, it does not clear > offerFilters > > in the allocator). > > > > ===== > > A bit of background on why this change is needed. > > > > To properly support V0 frameworks with large number of roles, it is > > necessary for the driver not to change the suppression state of the roles > > on its own. > > Therefore, due to the existence of the transparent re-registration in the > > driver, we will need to store the required suppression state in the > driver > > and make it re-register using this state. > > > > We could possibly avoid the proposed change of suppressOffers() by adding > > to the driver new interface for changing the suppression state, leaving > > suppressOffers() as it is, and marking it as deprecated. > > > > However, this will leave the behaviour of suppressOffers() deeply > > inconsistent with everything else. > > Compare the following two sequences of events. > > First one: > > - The framework creates and starts a driver with roles "role1", > "role2"... > > "role500", the driver registers > > - The framework calls a new method > driver.suppressOffersForRoles({"role1", > > ..., "role500"}), the driver performs SUPPRESS call for these roles and > > stores them in its suppressed roles set. > > (Alternative with the same result: the framework calls > > driver.updateFramework(FrameworkInfo, suppressedRoles={"role1", ..., > > "role500"}), the driver performs UPDATE_FRAMEWORK call with those > > parameters and stores the new suppressed roles set). > > - The driver, due to some reason, disconnects and re-registers with the > > same master, providing the stored suppressed roles set. > > - All the roles are still suppressed > > Second one: > > - The framework creates and starts a driver with roles "role1", > "role2"... > > "role500", the driver registers > > - The framework calls driver.suppressOffers(), the driver performs > > SUPPRESS call for all roles, but doesn't modify required suppression > state. > > - The driver, due to some reason, disconnects and re-registers with the > > same master, providing the stored suppressed roles set, which is empty. > > - Now, none of the roles are suppressed, allocator generates offers for > > 500 roles which will likely be declined by the framework. > > > > This is one of the examples which makes us strongly consider altering the > > interaction between suppressOffers() and the transparent re-registration > > when we add storing the suppression state to the driver. > > > > Regards, > > Andrei Sekretenko > >