[jira] [Commented] (MESOS-1474) Provide cluster maintenance primitives for operators.

Benjamin Mahler (JIRA) Mon, 16 Jun 2014 14:02:32 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032941#comment-14032941
 ]


Benjamin Mahler commented on MESOS-1474:
----------------------------------------

[~nekto0n], sorry I was being intentionally vague to express the desires rather 
than the solutions.

For quite some time we have been discussing the notion of an "inverse" resource 
offer as an {{Event}}. An "inverse" resource offer means that Mesos is 
requesting resources _back_ from the framework within some time interval. E.g. 
1 minute, 1 hour, or in your example, 1 month. But, I would like this ticket to 
express the requirements more so than the abstractions needed to solve them, so 
expect to see discussion around inverse offers as things progress!

> Provide cluster maintenance primitives for operators.
> -----------------------------------------------------
>
>                 Key: MESOS-1474
>                 URL: https://issues.apache.org/jira/browse/MESOS-1474
>             Project: Mesos
>          Issue Type: Epic
>          Components: framework, master, slave
>            Reporter: Benjamin Mahler
>
> Normally cluster upgrades can be done seamlessly using the built-in slave 
> recovery feature. However, there are situations where operators want to be 
> able to perform destructive maintenance operations on machines:
> * Non-recoverable slave upgrades.
> * Machine reboots.
> * Kernel upgrades.
> * etc.
> In these situations, best practice is to perform rolling maintenance in large 
> batches of machines. This can be problematic for frameworks when many related 
> tasks are located within a batch of machines going for maintenance.
> There are a few primitives of interest here:
> * Provide a way for operators to fully shutdown a slave (killing all tasks 
> underneath it).
> * Provide a way for operators to mark specific slaves as undergoing 
> maintenance. This means that no more offers are being sent for these slaves, 
> and no new tasks will launch on them.
> * Provide a way for frameworks to be notified when resources are requested to 
> be relinquished. This gives the framework to proactively move a task before 
> it is forcibly killed. It also allows the automation of operations like: 
> "please drain these slaves within 1 hour."



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MESOS-1474) Provide cluster maintenance primitives for operators.

Reply via email to