On Tue, Jan 15, 2008 at 04:38:54AM +0100, Marc Lehmann wrote:
> On Mon, Jan 14, 2008 at 10:38:51AM -0500, Chris Shoemaker <[EMAIL PROTECTED]>
> wrote:
> > On Mon, Jan 14, 2008 at 03:06:41AM +0100, Marc Lehmann wrote:
> > Notice that nothing prevents the waitpid from reaping any child at all.
>
> Right, thats how it was designed.
>
> > All that is then required is for someone to start a ev_child for the
> > child we just reaped. That event will never trigger.
>
> If you start the ev_child handler _after_ handling child events, thats
> true. You need to start it before. But you do not need to start the child
> handler before creating the process, or before the process exits, only
> before you poll for more events.
I'm glad we finally agree, in practice at least.
I do appreciate that there is a case in which libev _can_ provide the
exit status of a child that exited before the child_ev was started -
when the default event loop has never run between the exiting of the
child and the starting of the ev_child.
In my opinion, this is a rather special and uninteresting exception to
the general rule that running the default event loop will eagerly reap
the exit status of children, even before an ev_child has started.
> > ev_loop(loop, EVLOOP_NONBLOCK);
>
> Well, you run the loop before doing that, obviously, you have to start your
> child handler before doing that.
In my case, I may not know that I even care to start an ev_child until a
long, lomg time after the child has exited. And, when I do eventually care,
I won't know whether the child has exited or not. Obviously, I can't
avoid running the default loop indefinitely.
> There really is no other sensible way to do it, and you can always structure
> your program to make it work.
I suppose you mean by using my own sigchld handler.
> > If I understand what you're claiming, then that program should print the
> > exit status 99, and then terminate. In fact, it does neither.
>
> I am not claiming anything. You were claiming that you cnanot catch the exit
> status of a process that exited before startign an ev_child watcher, and that
> was and is simply untrue.
>
> example quote: "it does force an application to choose between either:
> a) not having access to the exit status of children that exited before a
> ev_child was started, OR b)" [...]
>
> And this claim is untrue.
You deleted an important part: "b) not being able to use ev_signal"
That would require running the default event loop. We agree that a)
is only true if the default event loop is run. Thus, the two options
are exclusive. I never claimed A, I claimed A XOR B.
> This works perfectly well:
>
> pid = fork ()
> // child exits here
> cw.pid = pid;
> ev_child_start (&cw);
>
> > Notice that the child (21046) has been reaped, with exit status 99,
> > _before_ the ev_child has been started.
>
> In *some* cases this is true, but not in general. Please note that
> registering a watcher "too late" is a problem for all other methods, too:
> every method will fail if you register interest in it too late.
Fair enough. You could say I'm expecting too much, to be able to get
the exit status of a child that died long before I started an
ev_child. But, as POSIX allows the waitpid caller to get the exit
status long after the child has died, and as I must offer the POSIX
semantics, I must also allow this.
> > And as a matter of fact, this is the same reason why it would prevent my
> > own waitpid from finding the already-reaped child.
>
> I cannot comment on your waitpid, but libev certainly allows you to
> register child watchers after the child exited, but before the exit status
> was fetched. This is true for *all* methods, even the one you outlined
> before.
>
> > I don't follow, really, but here's basically how I would do it:
>
> thats very slow and causes high overhead. the basic promise of libev is
> that it is efficient, not that it makes dozens/hundreds/thousands of
> syscalls to reap a signal child.
It's O(N) in the number of child watchers, and runs only upon SIGCHLD.
Is that really so bad? I'd love to know of a more efficient way, but
efficiency is secondard to correctness, so even if this handler is
less efficicient than libev's, I need it because it doesn't reap the
children eagerly.
> > Perhaps you wanted to avoid handling the ECHILD, but I would need it,
> > and using rpid == -1 seems rather like waitpid returning -1.
>
> libev handles ECHILD just fine, what makes you think it doesn't
> handle this case?
I meant acutally generating an event for ECHLD, which libev doesn't
do, but which I prefer.
> > Now, I realize that this might not offer the behavior you desire in terms
> > of multiple ev_childs registering for the same pid. But this is ok for
> > me, since I'm fine with the POSIX waitpid semantics of each child only
> > unblocking one waitpid, and other waitpids getting ECHILD. I don't want
> > it to appear like the child died more than once, even if there are multiple
> > ev_child watcher.
>
> libev doesn't make it as if a child died more than once, even if there are
> multiple ev_child watchers. this isn't possible with the unix semantics.
Eh?! Sometimes I wonder if we're reading the same code. Of course it does!
Look:
************
#include <ev.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
struct ev_child child_watcher1;
struct ev_child child_watcher2;
static void cb (EV_P_ struct ev_child *w, int revents) {
printf ("w->pid = %d; w->rstatus = %d\n", w->rpid, WEXITSTATUS(w->rstatus));
ev_unloop (EV_A_ EVUNLOOP_ONE); /* leave one loop call */
}
int main (int args, char *argv[]) {
struct ev_loop *loop = ev_default_loop(EV_FORK_ENABLE);
pid_t pid;
if ((pid = fork()) == 0) {
exit(99);
}
printf("spawned %d\n", pid);
ev_child_init (&child_watcher1, cb, pid);
ev_child_start (loop, &child_watcher1);
ev_child_init (&child_watcher2, cb, pid);
ev_child_start (loop, &child_watcher2);
ev_loop(loop, 0);
return 0;
}
**********
Two watchers for the same pid. How many get triggered?
$ ./foo
spawned 24326
w->pid = 24326; w->rstatus = 99
w->pid = 24326; w->rstatus = 99
TWO! It had better be so, because that's exactly what the code does:
It loops over ev_childs, feeding all the events. Remember?:
for (w = (ev_child *)childs [chain & (EV_PID_HASHSIZE - 1)]; w; w = (ev_child
*)((WL)w)->next)
if (w->pid == pid || !w->pid)
{
ev_set_priority (w, ev_priority (sw)); /* need to do it *now* */
w->rpid = pid;
w->rstatus = status;
ev_feed_event (EV_A_ (W)w, EV_CHILD);
}
Not that I particularly understand why it was designed to do so. It
seems a bit odd to me, and it's not really the behavior I want (that's
why I warned you that my patch changed that.) Understand, I'm not
criticizing the design, different apps have different goals. Again, I
really have to stick to POSIX semantics, so a child only dies once.
> in fact, your algorithm contains a race condition where an ev_child
> watcher gets the exist status of the wrong process (one that is started
> between the two waitpid calls), something libev avoids.
I assume you're talking about:
pid = waitpid (w->pid, &status, WNOHANG | WUNTRACED | WCONTINUED);
/* What if a new process is started right now? */
if (WCONTINUED && pid < 0 && errno == EINVAL) {
pid = waitpid (w->pid, &status, WNOHANG | WUNTRACED);
}
I don't understand what you mean about the sending the status of the
wrong process. If the watcher gets an event, I think it's always for
a "matching" process. (It might match the process group, btw.)
> In any case, the best way to proceed would be to go by my original
> recommendation and just use your own sigchld handler.
Thank you for the recommendation. I will follow it.
> > > And even if, you cna always provid your own child reaper. I just do not
> > > see your problem.
> >
> > I hope I've been clearer?
>
> Not sure, your claim was wrong and is wrong, wether it became clearer or
> not is not really relevant.
>
> And the solution is still the same one, and I still haven't heard why you
> don't just use your own sigchld watcher.
Actually, I am. Somehow got the impression that you were interested
to know why the existing sigchld watcher was unsuitable for our
purposes. Reviewing this thread, I realize that I probably imagined
that, prolonging this thread unnecessarily. Sorry 'bout that. :/
> > > No, signals are an unsharable resource just like sigchld. It just cannot
> > > be done with posix, sorry.
> >
> > I do realize it couldn't be shared, I meant to offer another loop type
> > to be used _instead_ of the default loop.
>
> And on what grounds? Couldn't you just tell me why providing your own sigchld
> handler wouldn't work?
You've made you point very clear. Thank you.
> > > > Instead, I loop over only list of outstanding calls to waitpid,
> > >
> > > I assume you do this on every call to waitpid, too...
> >
> > No, on every SIGCHLD.
>
> Then you have a bug, as you could have received the SIGCHLD earlier.
Oh, sorry, I misunderstood. Yes, on every _virtual_ waitpid, and on
every sigchld - exactly. (I thought you meant the real waitpid syscall.)
> libev handles this by not calling waitpid unless told to, i.e., outside the
> sigchld handler.
>
> > > > realized that it would be quite easy to modify libev to provide the
> > > > behavior I want. I would just remove the waitpid(-1) call, and put a
> > > > waitpid(pid) call inside the loop over childs[]. As an added benefit,
> > >
> > > That would break it, however.
> >
> > I guess that depends on your definition of "break".
>
> My definition of break is that it breaks the documented libev API, so
> no need to put "break" into quotes: this is the libev mailinglist, and
> breaking obviously means breaking the designed behaviour of libev (wether
> documented or not).
Very well.
> > It would, however, function exactly the same as my "legacy" sigchld
> > handler, which is good for me at least. :)
>
> You couldn't wait race-free for any child, and which watcher gets invoked
> is then a matter of registration order. Thats not acceptable to me. The
> point of libev is to provide a generic interface that doesn't suffer from
> races or non-deterministic child reaping. It also shouldn't have O(n)
> complexity with a high constant factor due to the syscall.-per-pid.
I still don't understand the allusion to race, but you're absolutely
right about registration order (but that's not non-deterministic, is
it?)
I concede that my needs are signficantly different from libev's
sigchld handler, and that my sigchld handler will make O(n_watchers *
n_sigchlds) waitpid calls while libev's will only make O(1 *
n_sigchlds).
I will follow your recommendation to implement my own sigchld handler.
-chris
_______________________________________________
libev mailing list
[email protected]
http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev