This is an automated email from the ASF dual-hosted git repository.

bbannier pushed a commit to branch 1.9.x
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/1.9.x by this push:
     new c313168  Garbage-collected lost tasks which are reported as running 
again.
c313168 is described below

commit c31316814398990abf1013bb0681a907426a4fec
Author: Benjamin Bannier <[email protected]>
AuthorDate: Fri Nov 1 13:08:35 2019 +0100

    Garbage-collected lost tasks which are reported as running again.
    
    Under certain conditions tasks which were previously `TASK_LOST` and
    completed can reappear in non-terminal states, e.g., if the agent on
    which they where running reconnect.
    
    This patch adds garbage collection of such completed tasks so that users
    do not see tasks twice when obtaining task information from the master
    API. This change does not affect tasks status updates where we already
    correctly reported a previously `TASK_LOST` state as superseded by e.g.,
    `TASK_RUNNING`.
    
    Review: https://reviews.apache.org/r/71641/
---
 src/master/master.cpp | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/src/master/master.cpp b/src/master/master.cpp
index 933fc89..73507ce 100644
--- a/src/master/master.cpp
+++ b/src/master/master.cpp
@@ -7848,6 +7848,24 @@ void Master::__reregisterSlave(
       Framework* framework = getFramework(frameworkId);
       if (framework != nullptr) {
         framework->unreachableTasks.erase(task.task_id());
+
+        // The master transitions task to terminal state on its own in certain
+        // scenarios (e.g., framework or agent teardown) before instructing the
+        // agent to remove it. However, we are not guaranteed that the message
+        // reaches the agent and is processed by it. If the agent fails to act
+        // on the message, tasks the master has declared terminal might 
reappear
+        // from the agent as non-terminal, see e.g., MESOS-9940.
+        //
+        // Avoid tracking a task as both terminal and non-terminal by
+        // garbage-collected completed tasks which come back as running.
+        framework->completedTasks.erase(
+            std::remove_if(
+                framework->completedTasks.begin(),
+                framework->completedTasks.end(),
+                [&](const Owned<Task>& task_) {
+                  return task_.get() && task_->task_id() == task.task_id();
+                }),
+            framework->completedTasks.end());
       }
 
       const string message = slaves.unreachable.contains(slaveInfo.id())

Reply via email to