Joris Van Remoortere created MESOS-6828:
-------------------------------------------
Summary: Consider ways for frameworks to ignore offers with an
Unavailability
Key: MESOS-6828
URL: https://issues.apache.org/jira/browse/MESOS-6828
Project: Mesos
Issue Type: Improvement
Reporter: Joris Van Remoortere
Assignee: Artem Harutyunyan
Due to the opt-in nature of maintenance primitives in Mesos, there is a
deficiency for cluster administrators when frameworks have not opted in.
An example case:
- Cluster with reasonable churn (tasks terminate naturally)
- Operator specifies maintenance schedule
Ideally *even* in a world where none of the frameworks had opted in to
maintenance primitives the operator would have some way of preventing
frameworks from scheduling further work on agents in the schedule. The natural
termination of the tasks in the cluster would allow the nodes to drain
gracefully and the operator to then perform maintenance.
2 options that have been discussed so far:
# Provide a capability for frameworks to automatically filter offers with an
{{Unavailability}} set.
#* Pro: Finer grained control. Allows other frameworks to keep scheduling short
lived tasks that can complete before the Unavailability.
#* Con: All frameworks have to be updated. Consider making this an environment
variable to the scheduler driver for legacy frameworks.
# Provide a flag on the master to filter all offers with an {{Unavailability}}
set.
#* Pro: Immediately actionable / usable.
#* Con: Coarse grained. Some frameworks may suffer efficiency.
#* Con: *Dangerous*: planning out a multi-day maintenance schedule for an
entire cluster will prevent any frameworks from scheduling further work,
potentially stalling the cluster.
Action Items: Provide further context for each option and consider others. We
need to ensure we have something immediately consumable by users to fill the
gap until maintenance primitives are the norm. We also need to ensure we
prevent dangerous scenarios like the Con listed for option #2.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)