[
https://issues.apache.org/jira/browse/MESOS-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743930#comment-14743930
]
Joris Van Remoortere commented on MESOS-1474:
---------------------------------------------
{code}
commit ce9c75d3eefe370e0ca87a294e96c6d2ae6cb566
Author: Joris Van Remoortere <[email protected]>
Date: Sun Aug 30 14:32:46 2015 -0400
Maintenance Primitives: Prevent Slave registration from DOWN machine.
Review: https://reviews.apache.org/r/37623
commit 147420e3e591c4b2674d3f84252066bc5d4b660c
Author: Joris Van Remoortere <[email protected]>
Date: Tue Aug 25 18:55:25 2015 -0400
Maintenance Primitives: Shutdown slave when maintenance is started.
Review: https://reviews.apache.org/r/37622
commit ea961908dadcf71234f95b2465e118c89cfca60c
Author: Joris Van Remoortere <[email protected]>
Date: Tue Aug 25 18:50:41 2015 -0400
Maintenance Primitives: Handle inverse offers in pre-V1 scheduler.
Review: https://reviews.apache.org/r/37621
commit a127671a726542e21cc7bc8838aa882b6bec4b49
Author: Joris Van Remoortere <[email protected]>
Date: Sun Aug 30 14:28:53 2015 -0400
Maintenance Primitives: Added Accept / Decline for InverseOffers.
Review: https://reviews.apache.org/r/37284
commit bf82689f69a21286177d52d7d7e5d2f713c1e5b1
Author: Joris Van Remoortere <[email protected]>
Date: Tue Aug 25 18:50:21 2015 -0400
Maintenance Primitives: Used V1 API for Master maintenance test.
Review: https://reviews.apache.org/r/37283
commit 388eaa5b133c4e1b4757a26c5e4afb84ad7bf08d
Author: Joris Van Remoortere <[email protected]>
Date: Sun Aug 30 14:24:16 2015 -0400
Maintenance Primitives: Added URL field to InverseOffer protobuf.
Review: https://reviews.apache.org/r/37234
commit e6375f319914741c652bca7c9b97049e81828f5e
Author: Joris Van Remoortere <[email protected]>
Date: Sun Aug 30 14:24:03 2015 -0400
Maintenance Primitives: Implemented Master::inverseOffer.
Review: https://reviews.apache.org/r/37180
commit 42f9ce5d61bf3e2c48d6a3de86d2e3e5cd3f6b57
Author: Joris Van Remoortere <[email protected]>
Date: Sun Aug 30 14:23:51 2015 -0400
Maintenance Primitives: Added updateInverseOffer to Allocator.
Review: https://reviews.apache.org/r/37280
commit 6c568bacea42f251bc68526a642533fe95e7bcf3
Author: Joris Van Remoortere <[email protected]>
Date: Sun Aug 30 14:23:37 2015 -0400
Maintenance Primitives: Added InverseOffer to V1 API.
Review: https://reviews.apache.org/r/37282
commit c702a2c55e3e6d893b959e22f29bb18cb34ffdbb
Author: Joris Van Remoortere <[email protected]>
Date: Sun Aug 30 14:22:03 2015 -0400
Maintenance Primitives: Added InverseOffers to Scheduler Event Offers.
Review: https://reviews.apache.org/r/37178
commit a1de99f42323d8eb1396fcd10884eaac32a93eab
Author: Joris Van Remoortere <[email protected]>
Date: Tue Aug 25 18:41:21 2015 -0400
Maintenance Primitives: Added inverse offers.
Review: https://reviews.apache.org/r/37177
commit 8e042581671fba360c92378ba47dee5a7d2b0f34
Author: Joris Van Remoortere <[email protected]>
Date: Sun Aug 30 14:19:40 2015 -0400
Maintenance Primitives: Added a new allocation overload to sorter.
This provides the ability to compute the frameworks that currently have
resources allocated or reserved. This information is used by the
maintenance feature to send out inverse offers.
Review: https://reviews.apache.org/r/37176
commit f87f733dbd34e39c91125fabe541269aea806266
Author: Joris Van Remoortere <[email protected]>
Date: Tue Aug 25 18:40:10 2015 -0400
Maintenance Primitives: Added updateUnavailability to master.
Review: https://reviews.apache.org/r/37175
commit ea48105aa68f249dd409c60ed1dd4998f3498e1e
Author: Joris Van Remoortere <[email protected]>
Date: Sun Aug 30 14:04:47 2015 -0400
Maintenance Primitives: Added unavailability to Allocator Slave struct.
Review: https://reviews.apache.org/r/37173
commit ee1eb2ba6b17cba66ad99f4e6344416c2d2709d2
Author: Joris Van Remoortere <[email protected]>
Date: Tue Aug 25 18:39:35 2015 -0400
Maintenance Primitives: Set offer `unavailability` for maintenance.
Review: https://reviews.apache.org/r/37172
commit 9e7ee6b26f8afe419c7758327fc9ce9f580e0b54
Author: Joris Van Remoortere <[email protected]>
Date: Sun Aug 30 13:56:56 2015 -0400
Maintenance Primitives: Added `MachineID` to Slave struct in Master.
Review: https://reviews.apache.org/r/37170
{code}
> Provide cluster maintenance primitives for operators.
> -----------------------------------------------------
>
> Key: MESOS-1474
> URL: https://issues.apache.org/jira/browse/MESOS-1474
> Project: Mesos
> Issue Type: Epic
> Components: framework, master, slave
> Reporter: Benjamin Mahler
> Assignee: Artem Harutyunyan
> Labels: mesosphere, twitter
>
> Sometimes operators need to perform maintenance on a mesos cluster; we define
> maintenance here as anything that requires the tasks to be drained on the
> slave(s). Most mesos upgrades can be done without affecting running tasks,
> but there are situations where maintenance is task-affecting:
> * Host maintenance (e.g. hardware repair, kernel upgrades).
> * Non-recoverable slave upgrades (e.g. adjusting slave attributes).
> * etc
> In order to ensure operators don’t violate frameworks’ SLAs, schedulers need
> to be aware of planned unavailability events.
> Maintenance awareness allows schedulers to avoid churn for long running tasks
> by placing them on machines not undergoing maintenance. If all resources are
> planned for maintenance, then the scheduler will prefer machines scheduled
> for maintenance least imminently.
> Maintenance awareness is also crucial when a scheduler uses [persistent
> disk|https://issues.apache.org/jira/browse/MESOS-1554] resources, to ensure
> that the scheduler is aware of the expected duration of unavailability for a
> persistent disk resource (e.g. using 3 1TB replicas, don’t need to replicate
> 1TB over the network when only 1 of the 3 replicas is going to be unavailable
> for a reboot (< 1 hour)).
> There are a few primitives of interest here:
> * Provide a way for operators to [fully shutdown a
> slave|https://issues.apache.org/jira/browse/MESOS-1475] (killing all tasks
> underneath it). Colloquially known as a "hard drain".
> * Provide a way for operators to mark specific slaves as scheduled for
> maintenance. This will inform the scheduler about the scheduled
> unavailability of the resources.
> * Provide a way for frameworks to be notified when resources are requested to
> be relinquished. This gives the framework to proactively move a task before
> it may be forcibly killed by an operator. It also allows the automation of
> operations like: "please drain these slaves within 1 hour."
> See the [design
> doc|https://docs.google.com/a/twitter.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit#]
> for the latest details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)