[ https://issues.apache.org/jira/browse/MESOS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neil Conway updated MESOS-6619: ------------------------------- Sprint: Mesosphere Sprint 48 Story Points: 8 > Duplicate elements in "completed_tasks" > --------------------------------------- > > Key: MESOS-6619 > URL: https://issues.apache.org/jira/browse/MESOS-6619 > Project: Mesos > Issue Type: Bug > Components: master > Reporter: Neil Conway > Assignee: Neil Conway > Labels: mesosphere > > Scenario: > # Framework starts non-partition-aware task T on agent A > # Agent A is partitioned. Task T is marked as a "completed task" in the > {{Framework}} struct of the master, as part of {{Framework::removeTask}}. > # Agent A re-registers with the master. The tasks running on A are re-added > to their respective frameworks on the master as running tasks. > # In {{Master::\_reregisterSlave}}, the master sends a > {{ShutdownFrameworkMessage}} for all non-partition-aware frameworks running > on the agent. The master then does {{removeTask}} for each task managed by > one of these frameworks, which results in calling {{Framework::removeTask}}, > which adds _another_ task to {{completed_tasks}}. Note that > {{completed_tasks}} does not attempt to detect/suppress duplicates, so this > results in two elements in the {{completed_tasks}} collection. > Similar problems occur when a partition-aware task is running on a > partitioned agent that re-registers: the result is a task in the {{tasks}} > list _and_ a task in the {{completed_tasks}} list. > Possible fixes/changes: > * Adding a task to the {{completed_tasks}} list when an agent becomes > partitioned is debatable; certainly for partition-aware tasks, the task is > not "completed". We might consider adding an "{{unreachable_tasks}}" list to > the HTTP endpoints. > * Regardless of whether we continue to use {{completed_tasks}} or add a new > collection, we should ensure the consistency of that data structure after > agent re-registration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)