Ian, process::reap always uses waitpid() as we didn't implement the
thread-per-pid wait() optimization.


On Wed, Apr 9, 2014 at 4:41 PM, Ian Downes (JIRA) <[email protected]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/MESOS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964816#comment-13964816]
>
> Ian Downes commented on MESOS-1199:
> -----------------------------------
>
> [~tknaup] had that slave been restarted? If not, then process::reap is
> using wait() so the above discussion doesn't apply. If so, then this is
> much higher than I had expected based on a 1 second poll of 100 executors.
> Nothing in the perf output jumps out; perhaps this is something else?
>
> We don't have any performance tests around number of executors. Is this
> something that you can put into a test?
>
>
>
> > Subprocess is "slow" -> gated by process::reap poll interval
> > ------------------------------------------------------------
> >
> >                 Key: MESOS-1199
> >                 URL: https://issues.apache.org/jira/browse/MESOS-1199
> >             Project: Mesos
> >          Issue Type: Improvement
> >    Affects Versions: 0.18.0
> >            Reporter: Ian Downes
> >
> > Subprocess uses process::reap to wait on the subprocess pid and set the
> exit status. However, process::reap polls with a one second interval
> resulting in a delay up to the interval duration before the status future
> is set.
> > This means if you need to wait for the subprocess to complete you get
> hit with E(delay) = 0.5 seconds, independent of the execution time. For
> example, the MesosContainerizer uses mesos-fetcher in a Subprocess to fetch
> the executor during launch. At Twitter we fetch a local file, i.e., a very
> fast operation, but the launch is blocked until the mesos-fetcher pid is
> reaped -> adding 0 to 1 seconds for every launch!
> > The problem is even worse with a chain of short Subprocesses because
> after the first Subprocess completes you'll be synchronized with the reap
> interval and you'll see nearly the full interval before notification, i.e.,
> 10 Subprocesses each of << 1 second duration with take ~10 seconds!
> > This has become particularly apparent in some new tests I'm working on
> where test durations are now greatly extended with each taking several
> seconds.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>

Reply via email to