[ 
https://issues.apache.org/jira/browse/MESOS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-6619:
-------------------------------
          Sprint: Mesosphere Sprint 48
    Story Points: 8

> Duplicate elements in "completed_tasks"
> ---------------------------------------
>
>                 Key: MESOS-6619
>                 URL: https://issues.apache.org/jira/browse/MESOS-6619
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>            Reporter: Neil Conway
>            Assignee: Neil Conway
>              Labels: mesosphere
>
> Scenario:
> # Framework starts non-partition-aware task T on agent A
> # Agent A is partitioned. Task T is marked as a "completed task" in the 
> {{Framework}} struct of the master, as part of {{Framework::removeTask}}.
> # Agent A re-registers with the master. The tasks running on A are re-added 
> to their respective frameworks on the master as running tasks.
> # In {{Master::\_reregisterSlave}}, the master sends a 
> {{ShutdownFrameworkMessage}} for all non-partition-aware frameworks running 
> on the agent. The master then does {{removeTask}} for each task managed by 
> one of these frameworks, which results in calling {{Framework::removeTask}}, 
> which adds _another_ task to {{completed_tasks}}. Note that 
> {{completed_tasks}} does not attempt to detect/suppress duplicates, so this 
> results in two elements in the {{completed_tasks}} collection.
> Similar problems occur when a partition-aware task is running on a 
> partitioned agent that re-registers: the result is a task in the {{tasks}} 
> list _and_ a task in the {{completed_tasks}} list.
> Possible fixes/changes:
> * Adding a task to the {{completed_tasks}} list when an agent becomes 
> partitioned is debatable; certainly for partition-aware tasks, the task is 
> not "completed". We might consider adding an "{{unreachable_tasks}}" list to 
> the HTTP endpoints.
> * Regardless of whether we continue to use {{completed_tasks}} or add a new 
> collection, we should ensure the consistency of that data structure after 
> agent re-registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to