Windows: Tied task lifetimes to executors.

To enable recovery of checkpointed tasks, the agent must be able to die
without also killing the executors and tasks, thus we cannot set the
"job object kill on close" limit unconditionally. However, the executors
must still be able to kill their tasks when they die, so we explicitly
enable this limit through a parent hook when launching the container for
the task. In this way, the agent can be restarted (e.g. for an upgrade)
without killing the executors, but the executors are still capable of
killing their tasks on catastrophic death.

Review: https://reviews.apache.org/r/65400


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/42d57869
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/42d57869
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/42d57869

Branch: refs/heads/master
Commit: 42d57869b46fe2333fb3c0ac43572c95d0ac577c
Parents: 65df55a
Author: Andrew Schwartzmeyer <and...@schwartzmeyer.com>
Authored: Wed Jan 17 13:45:03 2018 -0800
Committer: Andrew Schwartzmeyer <and...@schwartzmeyer.com>
Committed: Fri Feb 9 11:55:15 2018 -0800

----------------------------------------------------------------------
 src/launcher/executor.cpp | 8 ++++++++
 1 file changed, 8 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/42d57869/src/launcher/executor.cpp
----------------------------------------------------------------------
diff --git a/src/launcher/executor.cpp b/src/launcher/executor.cpp
index 050f5a0..164ecc7 100644
--- a/src/launcher/executor.cpp
+++ b/src/launcher/executor.cpp
@@ -62,6 +62,9 @@
 #include <stout/os/environment.hpp>
 #include <stout/os/kill.hpp>
 #include <stout/os/killtree.hpp>
+#ifdef __WINDOWS__
+#include <stout/windows/os.hpp>
+#endif // __WINDOWS__
 
 #include "checks/checker.hpp"
 #include "checks/health_checker.hpp"
@@ -485,6 +488,11 @@ protected:
     vector<process::Subprocess::ParentHook> parentHooks;
 #ifdef __WINDOWS__
     parentHooks.emplace_back(Subprocess::ParentHook::CREATE_JOB());
+    // Setting the "kill on close" job object limit ties the lifetime of the
+    // task to that of the executor. This ensures that if the executor exits,
+    // its task exits too.
+    parentHooks.emplace_back(Subprocess::ParentHook(
+        [](pid_t pid) { return os::set_job_kill_on_close_limit(pid); }));
 #endif // __WINDOWS__
 
     Try<Subprocess> s = subprocess(

Reply via email to