Andrei Sekretenko created MESOS-10023:
-----------------------------------------
Summary: Allocator method dispatches can be reordered (relative to
scheduler API calls which triggered them).
Key: MESOS-10023
URL: https://issues.apache.org/jira/browse/MESOS-10023
Project: Mesos
Issue Type: Bug
Reporter: Andrei Sekretenko
Observed an example of such reordering on a testing cluster with a V1 framework.
Framework side:
- framework issues ACCrEPT for a slave with no operations and a 365+ days
filter
- framework issues REVIVE call for all roles (which should clear all filters)
- framework waits for an offer for that slave and never receives it
Master side:
- master receives ACCEPT, processes the first part and starts authorization
- master receives REVIVE and dispatches reviveOffers() to the allocator
- master receives a response from authorizer (for ACCEPT) and dispatches
recoverResources() with a 365-day filter to the allocator
*We need to provide an ability for the framework to avoid such kind of
reorderings.*
Things to consider:
- v1 framework are not required to use a single connection for API requests;
even if they were, there still is a reconnection case, during which the views
of the framework and the master on the state of connection might differ. This
means that we cannot completely avoid this problem by sequencing processing of
requests from the same connection.
- Currently, all calls directly influencing allocator (except for
UPDATE_FRAMEWORK) return '202 ACCEPTED` at an early stage of processing.
Unconditionally changing this might break compatibility with some existing
frameworks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)