Repository: mesos
Updated Branches:
  refs/heads/1.5.x 64202a178 -> 35f284724


Improved handling of non-terminal operations after master failover.

This patch fixes the handling of non-terminal operations learned by a
newly elected master after a master failover, so that only these
operations are counted as using resources. Previously we did not count
any operations as using resources which by accident produced expected
behavior if the operation was already terminal when the master learned
about them.

We do not address the issue of being unable to properly account for
operations triggered by frameworks unknown to the master, see
MESOS-8582. Instead we emit a warning for now since the master might
continue to abort due to assertion failures due to incomplete resource
accounting.

Review: https://reviews.apache.org/r/65482/


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/35f28472
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/35f28472
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/35f28472

Branch: refs/heads/1.5.x
Commit: 35f2847245a79ebd17218dbe421bdd38f107960c
Parents: 64202a1
Author: Benjamin Bannier <benjamin.bann...@mesosphere.io>
Authored: Mon Mar 12 18:07:24 2018 +0100
Committer: Benjamin Bannier <bbann...@apache.org>
Committed: Mon Mar 12 18:33:18 2018 +0100

----------------------------------------------------------------------
 src/master/master.cpp | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/35f28472/src/master/master.cpp
----------------------------------------------------------------------
diff --git a/src/master/master.cpp b/src/master/master.cpp
index 7c99371..8ec78a7 100644
--- a/src/master/master.cpp
+++ b/src/master/master.cpp
@@ -7662,6 +7662,37 @@ void Master::updateSlave(UpdateSlaveMessage&& message)
           }
 
           addOperation(framework, slave, new Operation(operation));
+
+          if (!protobuf::isTerminalState(operation.latest_status().state())) {
+            // If we do not yet know the `FrameworkInfo` of the framework the
+            // operation originated from, we cannot properly track the 
operation
+            // at this point.
+            //
+            // TODO(bbannier): Consider introducing ways of making
+            // sure an agent always knows the `FrameworkInfo` of
+            // operations triggered on its resources, e.g., by adding
+            // an explicit `FrameworkInfo` to operations like is
+            // already done for `RunTaskMessage`, see MESOS-8582.
+            if (framework == nullptr) {
+              LOG(WARNING)
+                << "Cannot properly account for operation " << operation.uuid()
+                << " learnt in reconciliation of agent " << slaveId
+                << " since framework " << operation.framework_id()
+                << " is unknown; this can lead to assertion failures after the"
+                   " operation terminates, see MESOS-8536";
+              continue;
+            }
+
+            Try<Resources> consumedResources =
+              protobuf::getConsumedResources(operation.info());
+
+            CHECK_SOME(consumedResources)
+              << "Could not determine resources consumed by operation "
+              << operation.uuid();
+
+            usedByOperations[operation.framework_id()] +=
+              consumedResources.get();
+          }
         }
       }
 

Reply via email to