Fixed a bug in handling standalone container recovery. It's possible that the pid is not known for a standalone container. For instance, the agent crashes after the containerizer checkpoints the runtime directory, but before it is able to checkpoint the pid. In that case, we just assume that the child will exit due to the broken control pipe and we can move to destroy without recovering it.
Review: https://reviews.apache.org/r/64623 Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/25761177 Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/25761177 Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/25761177 Branch: refs/heads/master Commit: 2576117702cb021e0ad2c873697bfd8201b1ee37 Parents: 699a5d2 Author: Jie Yu <[email protected]> Authored: Tue Dec 12 19:19:17 2017 -0800 Committer: Jie Yu <[email protected]> Committed: Fri Dec 15 09:26:55 2017 -0800 ---------------------------------------------------------------------- src/slave/containerizer/mesos/containerizer.cpp | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/25761177/src/slave/containerizer/mesos/containerizer.cpp ---------------------------------------------------------------------- diff --git a/src/slave/containerizer/mesos/containerizer.cpp b/src/slave/containerizer/mesos/containerizer.cpp index 7ab0b07..5cb36af 100644 --- a/src/slave/containerizer/mesos/containerizer.cpp +++ b/src/slave/containerizer/mesos/containerizer.cpp @@ -910,10 +910,15 @@ Future<Nothing> MesosContainerizerProcess::recover( !containerizer::paths::getContainerForceDestroyOnRecovery( flags.runtime_dir, containerId); + const bool isRecoverableStandaloneContainer = + isStandaloneContainer && pid.isSome(); + // Add recoverable nested containers or standalone containers // to the list of 'ContainerState'. - if (isRecoverableNestedContainer || isStandaloneContainer) { - CHECK_SOME(directory); + if (isRecoverableNestedContainer || isRecoverableStandaloneContainer) { + CHECK_SOME(container->directory); + CHECK_SOME(container->pid); + ContainerState state = protobuf::slave::createContainerState( None(),
