[
https://issues.apache.org/jira/browse/AURORA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maxim Khutornenko reassigned AURORA-651:
----------------------------------------
Assignee: Maxim Khutornenko
> perform_maintenance_hosts should not temporarily remove machines
> ----------------------------------------------------------------
>
> Key: AURORA-651
> URL: https://issues.apache.org/jira/browse/AURORA-651
> Project: Aurora
> Issue Type: Task
> Components: Client
> Reporter: David Robinson
> Assignee: Maxim Khutornenko
>
> The aurora_admin tool provides the following drain/maintenance commands:
> - start_maintenance_hosts
> The list of hosts is marked for maintenance, and will be de-prioritized
> from consideration for scheduling. Note, they are not removed from
> consideration, and may still schedule tasks if resources are very scarce.
> Usually you would mark a larger set of machines for drain, and then do
> them in batches within the larger set, to help drained tasks not land on
> future hosts that will be drained shortly in subsequent batches.
> - host_maintenance_status
> Print the drain status of each supplied host.
> - perform_maintenance_hosts
> Asks the scheduler to remove any running tasks from the machine and
> remove it
> from service temporarily, perform some action on them, then return the
> machines
> to service.
> - end_maintenance_hosts
> The list of hosts is marked as not in a drained state anymore. This will
> allow normal scheduling to resume on the given list of hosts.
> The command that actually drains a machine is the perform_maintenance_hosts
> command, however it only drains a machine *temporarily*. As soon as the
> machine is drained it is placed back into service, thereby allowing tasks to
> be scheduler on it. This default behavior is wrong. The expected workflow is
> that the --post_drain_script option is used and the script is expected to
> shutdown the slave, typically by SSHing in and stopping the mesos process.
> It's not obvious that perform_maintenance_hosts's --post_drain_script must be
> used along with a script to properly drain a machine, and the admin tool does
> not provide any other commands that could be used to drain a machine *and
> leave it drained*.
> The ideal solution is described in AURORA-43.
--
This message was sent by Atlassian JIRA
(v6.2#6252)