[ https://issues.apache.org/jira/browse/MESOS-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766699#comment-16766699 ]
Greg Mann commented on MESOS-9541: ---------------------------------- After discussing this with some other committers, I think we're going to punt on it for the time being. It would be misleading to send an operation update to a framework when these agents are removed, since subsequent reconciliation would return OPERATION_UNKNOWN since the agents are not transitioned into a well-defined state in this case, they simply disappear. We can provide a proper solution for this once MESOS-9556 is resolved. Marking that ticket as blocking this one. > Transition agent operations to some "lost" state when the agent is removed. > --------------------------------------------------------------------------- > > Key: MESOS-9541 > URL: https://issues.apache.org/jira/browse/MESOS-9541 > Project: Mesos > Issue Type: Improvement > Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 1.7.0, 1.7.1 > Reporter: Chun-Hung Hsiao > Assignee: Greg Mann > Priority: Major > Labels: foundations, mesosphere > > MESOS-8782 and MESOS-8783 transition operations to > {{OPERATION_GONE_BY_OPERATOR}} or {{OPERATION_UNREACHABLE}} when their agents > are marked as gone or unreachable respectively. However, there are other > cases where agents can be "removed" and forgot by the master: > 1) When an agent tries to register with a new ID from the same IP: > https://github.com/apache/mesos/blob/f130544bdb8a9849096ee2cb35ebcbc7d8a326d8/src/master/master.cpp#L6836-L6849 > 2) When an agent requests to unregister: > https://github.com/apache/mesos/blob/f130544bdb8a9849096ee2cb35ebcbc7d8a326d8/src/master/master.cpp#L7817-L7840 > In these tasks, the master explicitly sends {{TASK_LOST}} for task status > updates (this also means that [this > documentation|https://github.com/apache/mesos/blob/f130544bdb8a9849096ee2cb35ebcbc7d8a326d8/include/mesos/mesos.proto#L2287-L2288] > is wrong), but does nothing for operations. We should design proper > operation status transitions for these cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)