[
https://issues.apache.org/jira/browse/MESOS-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088363#comment-15088363
]
Gabriel Hartmann commented on MESOS-4306:
-----------------------------------------
When you say the results of a GET against /master/maintenance/status provide a
list of DOWN machines (temporarily or permanently) are you saying there is an
indication of whether the machine is down permanently or not? Or are you
saying I'm only told that the machine is down? Or maybe I'm told how long it's
expected to be down with Infinity being an option?
> AGENT_DEAD Message
> ------------------
>
> Key: MESOS-4306
> URL: https://issues.apache.org/jira/browse/MESOS-4306
> Project: Mesos
> Issue Type: Task
> Reporter: Gabriel Hartmann
>
> Frameworks currently receive SLAVE_LOST messages when an Agent fails or is
> behind a network partition for some period of time. However frameworks and
> indeed Mesos cannot differentiate between an Agent being temporarily or
> permanently lost.
> It would be good to have a message indicating that an Agent is lost and won't
> be returning. This would require human intervention so an endpoint should be
> exposed to induce the sending of this message.
> This is particularly helpful for frameworks which are waiting for the return
> of persistent volumes. In the case where an Agent hosting significant data
> (multi terabyte) the framework may be willing to wait a significant amount of
> time before repairing its replication factor (for example). Explicit human
> provided information about the permanent state of Agents and therefore their
> resources would allow these kinds of frameworks to accelerate their recovery
> timelines.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)