[ 
https://issues.apache.org/jira/browse/MESOS-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766699#comment-16766699
 ] 

Greg Mann commented on MESOS-9541:
----------------------------------

After discussing this with some other committers, I think we're going to punt 
on it for the time being. It would be misleading to send an operation update to 
a framework when these agents are removed, since subsequent reconciliation 
would return OPERATION_UNKNOWN since the agents are not transitioned into a 
well-defined state in this case, they simply disappear.

We can provide a proper solution for this once MESOS-9556 is resolved. Marking 
that ticket as blocking this one.

> Transition agent operations to some "lost" state when the agent is removed.
> ---------------------------------------------------------------------------
>
>                 Key: MESOS-9541
>                 URL: https://issues.apache.org/jira/browse/MESOS-9541
>             Project: Mesos
>          Issue Type: Improvement
>    Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 1.7.0, 1.7.1
>            Reporter: Chun-Hung Hsiao
>            Assignee: Greg Mann
>            Priority: Major
>              Labels: foundations, mesosphere
>
> MESOS-8782 and MESOS-8783 transition operations to 
> {{OPERATION_GONE_BY_OPERATOR}} or {{OPERATION_UNREACHABLE}} when their agents 
> are marked as gone or unreachable respectively. However, there are other 
> cases where agents can be "removed" and forgot by the master:
> 1) When an agent tries to register with a new ID from the same IP:
> https://github.com/apache/mesos/blob/f130544bdb8a9849096ee2cb35ebcbc7d8a326d8/src/master/master.cpp#L6836-L6849
> 2) When an agent requests to unregister:
> https://github.com/apache/mesos/blob/f130544bdb8a9849096ee2cb35ebcbc7d8a326d8/src/master/master.cpp#L7817-L7840
> In these tasks, the master explicitly sends {{TASK_LOST}} for task status 
> updates (this also means that [this 
> documentation|https://github.com/apache/mesos/blob/f130544bdb8a9849096ee2cb35ebcbc7d8a326d8/include/mesos/mesos.proto#L2287-L2288]
>  is wrong), but does nothing for operations. We should design proper 
> operation status transitions for these cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to