Dominic Hamon created MESOS-2246:
------------------------------------
Summary: Improve slave health-checking
Key: MESOS-2246
URL: https://issues.apache.org/jira/browse/MESOS-2246
Project: Mesos
Issue Type: Epic
Components: master, slave
Reporter: Dominic Hamon
Assignee: Jie Yu
In the event of a network partition, or other systemic issues, we may see
widespread slave removal. There are several approaches we can take to mitigate
this issue including, but not limited to:
. rate limit the slave removal
. change how we do health checking to not rely on a single point of view
. work with frameworks to determine SLA of running services before removing the
slave
. manual control to allow operator intervention
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)