[
https://issues.apache.org/jira/browse/MESOS-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881436#comment-13881436
]
Benjamin Mahler commented on MESOS-943:
---------------------------------------
Thanks for bringing this up Nikita!
For (3) above: From the libprocess point of view, we don't want to reap all
children by default. This is done in some other libraries, like libev, and it
appears to be fairly common for users to override this behavior (see Tim's
comments in MESOS-895).
For (1) above: Our code is designed to be asynchronous which means blocking
worker threads is problematic. So, we'd like to provide an asynchronous
notification mechanism for reaping.
(2) Is the modified approach I posted in my reviews, but as you mentioned it
comes with the downside of a busy loop. I'll be including some comments in the
code to address how to optimize this, which brings up a 4th option:
4. Providing a process::reap(pid_t) utility which uses a thread for each pid
watched. Each thread will make a blocking call to waitpid(). This avoids the
busy loop, but I will hold off on making such an optimization for now.
> Provide an abstraction for asynchronous launching of subprocesses.
> ------------------------------------------------------------------
>
> Key: MESOS-943
> URL: https://issues.apache.org/jira/browse/MESOS-943
> Project: Mesos
> Issue Type: Improvement
> Reporter: Benjamin Mahler
> Assignee: Benjamin Mahler
>
> This has come up during [~idownes] changes to add containerization.
> We would like to be able to run commands asynchronously like:
> {{curl -O http://foo.com/bigfile.zip}}
> Currently, there is not an easy way to do this while having:
> 1. A Future handle on the exit status of the subprocess.
> 2. The means to 'discard' the future and consequently kill the subprocess
> (e.g. stalled hadoop command).
> 3. Handles to stdin, stdout, stderr of the subprocess.
> The first issue is that we need to re-work the Reaper to not reap _all_
> subprocesses. Rather, we need to allow other components to reap their own
> forked subprocesses without the slave's Reaper "stealing" the exit status
> information. I've proposed that we move the Reaper into libprocess initially
> with the only change being to reap the desired pids. (We can optimize this
> later using a per-pid blocking thread or SIGCHLD).
> One concern is that if we 'leak' child processes by accidentally not reaping,
> we may fill the process table with zombie processes. However, we have tight
> control over where our code performs forks, and can enforce proper reaping.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)