[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890850#comment-15890850
 ] 

Tim Harper commented on MESOS-6223:
-----------------------------------

This should help fix an issue we are seeing with tasks and reserved resources 
in Marathon:

https://github.com/mesosphere/marathon/issues/5284

In Marathon's case, when a residential (has reserved resources) task becomes 
unreachable, due to a the node rebooting, we never receive a terminal state for 
the task even though the host reboots and comes back online. This is because, 
we believe, during reconciliation we send the old agent ID and the task ID, and 
Mesos continually reports  an unknown status. Were the agent in question to 
keep the same agent ID, then an explicit reconciliation of that agent ID + the 
task ID, I think, should be able to result in a status update which signals 
definite terminality.

> Allow agents to re-register post a host reboot
> ----------------------------------------------
>
>                 Key: MESOS-6223
>                 URL: https://issues.apache.org/jira/browse/MESOS-6223
>             Project: Mesos
>          Issue Type: Improvement
>          Components: agent
>            Reporter: Megha Sharma
>            Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to