Dominic Hamon created MESOS-2246:
------------------------------------

             Summary: Improve slave health-checking
                 Key: MESOS-2246
                 URL: https://issues.apache.org/jira/browse/MESOS-2246
             Project: Mesos
          Issue Type: Epic
          Components: master, slave
            Reporter: Dominic Hamon
            Assignee: Jie Yu


In the event of a network partition, or other systemic issues, we may see  
widespread slave removal. There are several approaches we can take to mitigate 
this issue including, but not limited to:

. rate limit the slave removal
. change how we do health checking to not rely on a single point of view
. work with frameworks to determine SLA of running services before removing the 
slave
. manual control to allow operator intervention 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to