[ 
https://issues.apache.org/jira/browse/MESOS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949915#comment-15949915
 ] 

Joris Van Remoortere commented on MESOS-6828:
---------------------------------------------

Based on some offline discussion I want to suggest that the least dangerous 
solution (in my opinion) is to have frameworks prefer offers with the longest 
availability by default.

Aurora is a good example of a framework that collects offers and has the 
ability to express a preference while iterating the offers to match a task to 
launch.
Preferring offers with no (or longest in the future) unavailability will 
naturally tend new tasks away from machines that will be entering maintenace.
A benefit of this approach is that the agents in the schedule will still be 
used if there is demand pressure for resources by the framework.

> Consider ways for frameworks to ignore offers with an Unavailability
> --------------------------------------------------------------------
>
>                 Key: MESOS-6828
>                 URL: https://issues.apache.org/jira/browse/MESOS-6828
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Joris Van Remoortere
>            Assignee: Artem Harutyunyan
>              Labels: maintenance
>
> Due to the opt-in nature of maintenance primitives in Mesos, there is a 
> deficiency for cluster administrators when frameworks have not opted in.
> An example case:
> - Cluster with reasonable churn (tasks terminate naturally)
> - Operator specifies maintenance schedule
> Ideally *even* in a world where none of the frameworks had opted in to 
> maintenance primitives the operator would have some way of preventing 
> frameworks from scheduling further work on agents in the schedule. The 
> natural termination of the tasks in the cluster would allow the nodes to 
> drain gracefully and the operator to then perform maintenance.
> 2 options that have been discussed so far:
> # Provide a capability for frameworks to automatically filter offers with an 
> {{Unavailability}} set.
> #* Pro: Finer grained control. Allows other frameworks to keep scheduling 
> short lived tasks that can complete before the Unavailability.
> #* Con: All frameworks have to be updated. Consider making this an 
> environment variable to the scheduler driver for legacy frameworks.
> # Provide a flag on the master to filter all offers with an 
> {{Unavailability}} set.
> #* Pro: Immediately actionable / usable.
> #* Con: Coarse grained. Some frameworks may suffer efficiency.
> #* Con: *Dangerous*: planning out a multi-day maintenance schedule for an 
> entire cluster will prevent any frameworks from scheduling further work, 
> potentially stalling the cluster.
> Action Items: Provide further context for each option and consider others. We 
> need to ensure we have something immediately consumable by users to fill the 
> gap until maintenance primitives are the norm. We also need to ensure we 
> prevent dangerous scenarios like the Con listed for option #2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to