[
https://issues.apache.org/jira/browse/MESOS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964553#comment-13964553
]
Timothy St. Clair edited comment on MESOS-1199 at 4/9/14 7:35 PM:
------------------------------------------------------------------
[~bmahler] I put a tinker-toy example here:
https://github.com/timothysc/tests/tree/master/child_pipes
Obviously it would need to be more elaborate, but it should scale out to a
couple hundred children w/select. Couple thousand w/epoll.
The basic gist is you leave a file descriptor breadcrumb that the parent can
use to monitor, or communicate with, the child every time a child is created
vs. using the pid (b/c it's not a fd). The file descriptor enables the parent
to use calls that monitor "sets" of file descriptors (select/epoll), and will
block. Thus, one thread can call select on many file descriptors, and if any
one of children are destroyed, you will be notified as fast as the kernel can
pop you out of the select/epoll call asynchronously. It's uber-fast/efficient
and a typical 1:many process pattern.
was (Author: tstclair):
[~bmahler] I put a tinker-toy example here:
https://github.com/timothysc/tests/tree/master/child_pipes
Obviously it would need to be more elaborate, but it should scale out to a
couple hundred children w/select. Couple thousand w/epoll.
> Subprocess is "slow" -> gated by process::reap poll interval
> ------------------------------------------------------------
>
> Key: MESOS-1199
> URL: https://issues.apache.org/jira/browse/MESOS-1199
> Project: Mesos
> Issue Type: Improvement
> Affects Versions: 0.18.0
> Reporter: Ian Downes
>
> Subprocess uses process::reap to wait on the subprocess pid and set the exit
> status. However, process::reap polls with a one second interval resulting in
> a delay up to the interval duration before the status future is set.
> This means if you need to wait for the subprocess to complete you get hit
> with E(delay) = 0.5 seconds, independent of the execution time. For example,
> the MesosContainerizer uses mesos-fetcher in a Subprocess to fetch the
> executor during launch. At Twitter we fetch a local file, i.e., a very fast
> operation, but the launch is blocked until the mesos-fetcher pid is reaped ->
> adding 0 to 1 seconds for every launch!
> The problem is even worse with a chain of short Subprocesses because after
> the first Subprocess completes you'll be synchronized with the reap interval
> and you'll see nearly the full interval before notification, i.e., 10
> Subprocesses each of << 1 second duration with take ~10 seconds!
> This has become particularly apparent in some new tests I'm working on where
> test durations are now greatly extended with each taking several seconds.
--
This message was sent by Atlassian JIRA
(v6.2#6252)