For as long as I can remember, there have been circumstances where
AOLserver doesn't reap zombie processes, and there have been
circumstances where AOLserver doesn't restart system calls that error
with EINTR, to the enduring confusion of new users.
For some reason, these problems seem to occur frequently on *BSD,
giving these operating systems an unneeded black eye. (Any theories
why?)
So far as I can tell from comments on openacs.org, these problems
still occur in AOLserver 3.4.
Fixes to these problems have often been non-fixes, such as handling
EINTR explicitly for a single system call; or calling waitpid for a
single instance of fork().
So I ask,
(1) Why can't the zombie problem be *really* fixed by having the
SIGCHLD handler cause the following standard Unix idiom to be
executed?
while((pid = wait4(-1,&status,WNOHANG,&ru)) > 0) {
;
}
AOLserver keeps a list of ``known'' children that it waits on
periodically (to collect exit status?), AFAICT, but some children must
not be on that list, or people wouldn't be getting zombies. Unknown
children still need to be waited for.
(2) I suspect, though from lack of experience I am not certain, that
the `Interrupted system call' problem can be *really* fixed by calling
sigaction to specify SA_RESTART for all signals. But the only
awareness of SA_RESTART in the AOLserver code is the SIGHUP handler in
sproc.cpp.
I'm happy to contribute code if these seem like good ideas. Just
thought I'd ask first.