Re: Disabling the SIGCHLD handler

Chris Shoemaker Tue, 15 Jan 2008 08:38:51 -0800

On Tue, Jan 15, 2008 at 04:38:54AM +0100, Marc Lehmann wrote:
> On Mon, Jan 14, 2008 at 10:38:51AM -0500, Chris Shoemaker <[EMAIL PROTECTED]> 
> wrote:
> > On Mon, Jan 14, 2008 at 03:06:41AM +0100, Marc Lehmann wrote:
> > Notice that nothing prevents the waitpid from reaping any child at all.
> 
> Right, thats how it was designed.
> 
> > All that is then required is for someone to start a ev_child for the
> > child we just reaped.  That event will never trigger.
> 
> If you start the ev_child handler _after_ handling child events, thats
> true.  You need to start it before. But you do not need to start the child
> handler before creating the process, or before the process exits, only
> before you poll for more events.


I'm glad we finally agree, in practice at least.

I do appreciate that there is a case in which libev _can_ provide the
exit status of a child that exited before the child_ev was started -
when the default event loop has never run between the exiting of the
child and the starting of the ev_child.

In my opinion, this is a rather special and uninteresting exception to
the general rule that running the default event loop will eagerly reap
the exit status of children, even before an ev_child has started.


> >   ev_loop(loop, EVLOOP_NONBLOCK);
> 
> Well, you run the loop before doing that, obviously, you have to start your
> child handler before doing that.

In my case, I may not know that I even care to start an ev_child until a
long, lomg time after the child has exited.  And, when I do eventually care,
I won't know whether the child has exited or not.  Obviously, I can't
avoid running the default loop indefinitely.

> There really is no other sensible way to do it, and you can always structure
> your program to make it work.

I suppose you mean by using my own sigchld handler.

> > If I understand what you're claiming, then that program should print the
> > exit status 99, and then terminate.  In fact, it does neither.
> 
> I am not claiming anything. You were claiming that you cnanot catch the exit
> status of a process that exited before startign an ev_child watcher, and that
> was and is simply untrue.
> 
> example quote: "it does force an application to choose between either:
> a) not having access to the exit status of children that exited before a
> ev_child was started, OR b)" [...]
> 
> And this claim is untrue.

You deleted an important part: "b) not being able to use ev_signal"

That would require running the default event loop.  We agree that a)
is only true if the default event loop is run.  Thus, the two options
are exclusive.  I never claimed A, I claimed A XOR B.

> This works perfectly well:
> 
>   pid = fork ()
>   // child exits here
>   cw.pid = pid;
>   ev_child_start (&cw);
> 
> > Notice that the child (21046) has been reaped, with exit status 99,
> > _before_ the ev_child has been started. 
> 
> In *some* cases this is true, but not in general. Please note that
> registering a watcher "too late" is a problem for all other methods, too:
> every method will fail if you register interest in it too late.

Fair enough.  You could say I'm expecting too much, to be able to get
the exit status of a child that died long before I started an
ev_child.  But, as POSIX allows the waitpid caller to get the exit
status long after the child has died, and as I must offer the POSIX
semantics, I must also allow this.

> > And as a matter of fact, this is the same reason why it would prevent my
> > own waitpid from finding the already-reaped child.
> 
> I cannot comment on your waitpid, but libev certainly allows you to
> register child watchers after the child exited, but before the exit status
> was fetched. This is true for *all* methods, even the one you outlined
> before.
> 
> > I don't follow, really, but here's basically how I would do it:
> 
> thats very slow and causes high overhead. the basic promise of libev is
> that it is efficient, not that it makes dozens/hundreds/thousands of
> syscalls to reap a signal child.

It's O(N) in the number of child watchers, and runs only upon SIGCHLD.
Is that really so bad?  I'd love to know of a more efficient way, but
efficiency is secondard to correctness, so even if this handler is
less efficicient than libev's, I need it because it doesn't reap the
children eagerly.

> > Perhaps you wanted to avoid handling the ECHILD, but I would need it,
> > and using rpid == -1 seems rather like waitpid returning -1.
> 
> libev handles ECHILD just fine, what makes you think it doesn't
> handle this case?

I meant acutally generating an event for ECHLD, which libev doesn't
do, but which I prefer.

> > Now, I realize that this might not offer the behavior you desire in terms
> > of multiple ev_childs registering for the same pid.  But this is ok for
> > me, since I'm fine with the POSIX waitpid semantics of each child only
> > unblocking one waitpid, and other waitpids getting ECHILD.  I don't want
> > it to appear like the child died more than once, even if there are multiple
> > ev_child watcher.
> 
> libev doesn't make it as if a child died more than once, even if there are
> multiple ev_child watchers. this isn't possible with the unix semantics.

Eh?!  Sometimes I wonder if we're reading the same code.  Of course it does!

Look:
************
#include <ev.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>

struct ev_child child_watcher1;
struct ev_child child_watcher2;

static void cb (EV_P_ struct ev_child *w, int revents) {
  printf ("w->pid = %d; w->rstatus = %d\n", w->rpid,  WEXITSTATUS(w->rstatus));
  ev_unloop (EV_A_ EVUNLOOP_ONE); /* leave one loop call */
}

int main (int args, char *argv[]) {
  struct ev_loop *loop = ev_default_loop(EV_FORK_ENABLE);
  pid_t pid;

  if ((pid = fork()) == 0) {
    exit(99);
  }
  printf("spawned %d\n", pid);
  ev_child_init (&child_watcher1, cb, pid);
  ev_child_start (loop, &child_watcher1);
  ev_child_init (&child_watcher2, cb, pid);
  ev_child_start (loop, &child_watcher2);

  ev_loop(loop, 0);
  return 0;
}
**********
Two watchers for the same pid.  How many get triggered?

$ ./foo
spawned 24326
w->pid = 24326; w->rstatus = 99
w->pid = 24326; w->rstatus = 99


TWO!  It had better be so, because that's exactly what the code does:

It loops over ev_childs, feeding all the events.  Remember?:

for (w = (ev_child *)childs [chain & (EV_PID_HASHSIZE - 1)]; w; w = (ev_child 
*)((WL)w)->next)
    if (w->pid == pid || !w->pid)
      {
        ev_set_priority (w, ev_priority (sw)); /* need to do it *now* */
        w->rpid    = pid;
        w->rstatus = status;
        ev_feed_event (EV_A_ (W)w, EV_CHILD);
      }

Not that I particularly understand why it was designed to do so.  It
seems a bit odd to me, and it's not really the behavior I want (that's
why I warned you that my patch changed that.)  Understand, I'm not
criticizing the design, different apps have different goals.  Again, I
really have to stick to POSIX semantics, so a child only dies once.

> in fact, your algorithm contains a race condition where an ev_child
> watcher gets the exist status of the wrong process (one that is started
> between the two waitpid calls), something libev avoids.

I assume you're talking about:

pid = waitpid (w->pid, &status, WNOHANG | WUNTRACED | WCONTINUED);
/* What if a new process is started right now? */
if (WCONTINUED && pid < 0 && errno == EINVAL) {
   pid = waitpid (w->pid, &status, WNOHANG | WUNTRACED);
}

I don't understand what you mean about the sending the status of the
wrong process.  If the watcher gets an event, I think it's always for
a "matching" process. (It might match the process group, btw.)

> In any case, the best way to proceed would be to go by my original
> recommendation and just use your own sigchld handler.

Thank you for the recommendation.  I will follow it.

> > > And even if, you cna always provid your own child reaper. I just do not
> > > see your problem.
> > 
> > I hope I've been clearer?
> 
> Not sure, your claim was wrong and is wrong, wether it became clearer or
> not is not really relevant.
> 
> And the solution is still the same one, and I still haven't heard why you
> don't just use your own sigchld watcher.

Actually, I am.  Somehow got the impression that you were interested
to know why the existing sigchld watcher was unsuitable for our
purposes.  Reviewing this thread, I realize that I probably imagined
that, prolonging this thread unnecessarily.  Sorry 'bout that. :/

> > > No, signals are an unsharable resource just like sigchld. It just cannot
> > > be done with posix, sorry.
> > 
> > I do realize it couldn't be shared, I meant to offer another loop type
> > to be used _instead_ of the default loop.
> 
> And on what grounds? Couldn't you just tell me why providing your own sigchld
> handler wouldn't work?

You've made you point very clear.  Thank you.

> > > > Instead, I loop over only list of outstanding calls to waitpid,
> > > 
> > > I assume you do this on every call to waitpid, too...
> > 
> > No, on every SIGCHLD.
> 
> Then you have a bug, as you could have received the SIGCHLD earlier.

Oh, sorry, I misunderstood.  Yes, on every _virtual_ waitpid, and on
every sigchld - exactly.  (I thought you meant the real waitpid syscall.)

> libev handles this by not calling waitpid unless told to, i.e., outside the
> sigchld handler.
> 
> > > > realized that it would be quite easy to modify libev to provide the
> > > > behavior I want.  I would just remove the waitpid(-1) call, and put a
> > > > waitpid(pid) call inside the loop over childs[].  As an added benefit,
> > > 
> > > That would break it, however.
> > 
> > I guess that depends on your definition of "break".
> 
> My definition of break is that it breaks the documented libev API, so
> no need to put "break" into quotes: this is the libev mailinglist, and
> breaking obviously means breaking the designed behaviour of libev (wether
> documented or not).

Very well.

> > It would, however, function exactly the same as my "legacy" sigchld
> > handler, which is good for me at least. :)
> 
> You couldn't wait race-free for any child, and which watcher gets invoked
> is then a matter of registration order. Thats not acceptable to me. The
> point of libev is to provide a generic interface that doesn't suffer from
> races or non-deterministic child reaping. It also shouldn't have O(n)
> complexity with a high constant factor due to the syscall.-per-pid.

I still don't understand the allusion to race, but you're absolutely
right about registration order (but that's not non-deterministic, is
it?)

I concede that my needs are signficantly different from libev's
sigchld handler, and that my sigchld handler will make O(n_watchers *
n_sigchlds) waitpid calls while libev's will only make O(1 *
n_sigchlds).

I will follow your recommendation to implement my own sigchld handler.

-chris

_______________________________________________
libev mailing list
[email protected]
http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev

Re: Disabling the SIGCHLD handler

Reply via email to