[
https://issues.apache.org/jira/browse/MESOS-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923031#comment-13923031
]
Benjamin Mahler commented on MESOS-544:
---------------------------------------
To break this down into increments for GSOC students:
(1) Implement a signal handler for a clean shutdown of the slave, that kills
all running tasks. This can likely re-use the existing slave shutdown mechanism.
(2) The problem with (1) is that the Master will possibly wait up to the health
checking delay (~75 seconds) to notify the framework that the tasks were lost.
We should consider sending an unregistration request vs. status updates to
improve this.
(3) stretch: longer term, it may be beneficial to have Frameworks aware of
operator induced drains. This introduces possibly unnecessary complexity so the
point of (3) is to explore the tradeoffs of exposing explicit draining
information to frameworks.
> Mesos-slave support for "node drain"
> ------------------------------------
>
> Key: MESOS-544
> URL: https://issues.apache.org/jira/browse/MESOS-544
> Project: Mesos
> Issue Type: Story
> Components: framework, master, slave
> Reporter: Tobias Weingartner
> Labels: gsoc2014
> Fix For: 0.19.0
>
>
> Given that multiple frameworks can be present on a machine at a time, and
> writing "node drain" for each possible framework is an intractable task, it
> would nice if the slave-master core had a means to tell frameworks that tasks
> were killed to drain a host. Or possibly that the slave was told to drain
> the host of all tasks (graceful shutdown, etc).
> {noformat}
> # drain current host
> pkill -USR1 mesos-slave
> {noformat}
> This would make writing scripts for site-ops to do node maintenance much
> easier... :)
--
This message was sent by Atlassian JIRA
(v6.2#6252)