[
https://issues.apache.org/jira/browse/MESOS-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Bannier reassigned MESOS-10018:
----------------------------------------
Shepherd: Benno Evers
Sprint: Foundations: RI-19 57
Assignee: Benjamin Bannier
> Duplicate tasks if agent partitioned during maintenance down period
> -------------------------------------------------------------------
>
> Key: MESOS-10018
> URL: https://issues.apache.org/jira/browse/MESOS-10018
> Project: Mesos
> Issue Type: Bug
> Reporter: Benjamin Bannier
> Assignee: Benjamin Bannier
> Priority: Major
>
> When the master starts maintenance for a node it
> (1) sends a {{ShutdownMessage}} message to agent, and
> (2) removes the slave which transitions all tasks to {{TASK_LOST}} and moves
> them
> to the completed task set.
> If the {{ShutdownMessage}} isn't fully processed on the agent (e.g., message
> dropped between (1) and (2), or agent process killed before the executor has
> shut down), the agent could come back with the lost task running. It would
> report the task on registration with the master, which would add it to the
> list of active tasks. With that the same task could be both completed and
> active.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)