This is an automated email from the ASF dual-hosted git repository.

bbannier pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/1.7.x by this push:
     new d8acd9c  Garbage-collected lost tasks which are reported as running 
again.
d8acd9c is described below

commit d8acd9cfacd2edf8500f07f63a8837aa0ddd14ba
Author: Benjamin Bannier <[email protected]>
AuthorDate: Fri Nov 1 13:08:35 2019 +0100

    Garbage-collected lost tasks which are reported as running again.
    
    Under certain conditions tasks which were previously `TASK_LOST` and
    completed can reappear in non-terminal states, e.g., if the agent on
    which they where running reconnect.
    
    This patch adds garbage collection of such completed tasks so that users
    do not see tasks twice when obtaining task information from the master
    API. This change does not affect tasks status updates where we already
    correctly reported a previously `TASK_LOST` state as superseded by e.g.,
    `TASK_RUNNING`.
    
    Review: https://reviews.apache.org/r/71641/
---
 src/master/master.cpp | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/src/master/master.cpp b/src/master/master.cpp
index 1f7dd3a..838c801 100644
--- a/src/master/master.cpp
+++ b/src/master/master.cpp
@@ -7484,6 +7484,24 @@ void Master::__reregisterSlave(
       Framework* framework = getFramework(frameworkId);
       if (framework != nullptr) {
         framework->unreachableTasks.erase(task.task_id());
+
+        // The master transitions task to terminal state on its own in certain
+        // scenarios (e.g., framework or agent teardown) before instructing the
+        // agent to remove it. However, we are not guaranteed that the message
+        // reaches the agent and is processed by it. If the agent fails to act
+        // on the message, tasks the master has declared terminal might 
reappear
+        // from the agent as non-terminal, see e.g., MESOS-9940.
+        //
+        // Avoid tracking a task as both terminal and non-terminal by
+        // garbage-collected completed tasks which come back as running.
+        framework->completedTasks.erase(
+            std::remove_if(
+                framework->completedTasks.begin(),
+                framework->completedTasks.end(),
+                [&](const Owned<Task>& task_) {
+                  return task_.get() && task_->task_id() == task.task_id();
+                }),
+            framework->completedTasks.end());
       }
 
       const string message = slaves.unreachable.contains(slaveInfo.id())

Reply via email to