Fixed handling of terminal operations in `updateSlave` handler. An offer operation can be become terminal between any previously received non-terminal offer operation status update and receiving an `UpdateSlaveMessage` (e.g., if the agent failed over, or when the agent was partitioned from the master).
The master will in its offer operations status handler attempt to apply operations which became terminal since the last update. At the same time, the total resources in an `UpdateSlaveMessage` would already contain the result of applying the operation, and we need to prevent the master from attempting to apply the same operation twice. This patch updates the master handler for `UpdateSlaveMessage` to transition pending operations which are reported as terminal without also updating the resources on the agent as any update would already be reflected in the new total from the `UpdateSlaveMessage. Review: https://reviews.apache.org/r/65072/ Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/899efac6 Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/899efac6 Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/899efac6 Branch: refs/heads/1.5.x Commit: 899efac6c12dfa4aa714e44a53312171aedd795d Parents: 7ffe111 Author: Benjamin Bannier <[email protected]> Authored: Fri Jan 12 15:40:46 2018 -0800 Committer: Greg Mann <[email protected]> Committed: Fri Jan 12 17:17:39 2018 -0800 ---------------------------------------------------------------------- src/master/master.cpp | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/899efac6/src/master/master.cpp ---------------------------------------------------------------------- diff --git a/src/master/master.cpp b/src/master/master.cpp index 1564506..ea98c38 100644 --- a/src/master/master.cpp +++ b/src/master/master.cpp @@ -7660,6 +7660,39 @@ void Master::updateSlave(UpdateSlaveMessage&& message) ->CopyFrom(providerId.get()); } } + + // If a known operation became terminal between any previous offer + // operation status update and this `UpdateSlaveMessage`, the total + // resources we were sent already had the operation applied. We need + // to update the state of the operation to terminal here so that any + // update sent by the agent later does not cause us to apply the + // operation again. + if (provider.newOperations.isSome() && + provider.newOperations->contains(uuid)) { + Option<Operation> oldOperation = provider.oldOperations->get(uuid); + Option<Operation> newOperation = provider.newOperations->get(uuid); + + CHECK_SOME(oldOperation); + CHECK_SOME(newOperation); + + if (!protobuf::isTerminalState( + oldOperation->latest_status().state()) && + protobuf::isTerminalState( + newOperation->latest_status().state())) { + Operation* operation = CHECK_NOTNULL(slave->getOperation(uuid)); + + UpdateOperationStatusMessage update = + protobuf::createUpdateOperationStatusMessage( + uuid, + newOperation->latest_status(), + newOperation->latest_status(), + operation->framework_id(), + operation->slave_id()); + + updateOperation( + operation, update, false); // Do not update resources. + } + } } }
