[
https://issues.apache.org/jira/browse/MESOS-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriel Hartmann updated MESOS-4306:
------------------------------------
Comment: was deleted
(was: Yes, I thought it might have overlap with the maintenance primitives as
well. You're saying this as a proposed implementation method correct? The
AGENT_DEAD message / endpoint would use the same implementation as the
maintenance primitives right?)
> AGENT_DEAD Message
> ------------------
>
> Key: MESOS-4306
> URL: https://issues.apache.org/jira/browse/MESOS-4306
> Project: Mesos
> Issue Type: Task
> Reporter: Gabriel Hartmann
>
> Frameworks currently receive SLAVE_LOST messages when an Agent fails or is
> behind a network partition for some period of time. However frameworks and
> indeed Mesos cannot differentiate between an Agent being temporarily or
> permanently lost.
> It would be good to have a message indicating that an Agent is lost and won't
> be returning. This would require human intervention so an endpoint should be
> exposed to induce the sending of this message.
> This is particularly helpful for frameworks which are waiting for the return
> of persistent volumes. In the case where an Agent hosting significant data
> (multi terabyte) the framework may be willing to wait a significant amount of
> time before repairing its replication factor (for example). Explicit human
> provided information about the permanent state of Agents and therefore their
> resources would allow these kinds of frameworks to accelerate their recovery
> timelines.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)