On Wed, Oct 15, 2003 at 07:44:57PM +0100, Colm MacCarthaigh wrote:
>
> One of our servers is getting truly hammered today (5,000 simultaneous
> clients is not unusual right now), so I've been tinkering with worker
> instead of prefork. It's not doing nice things for me :(
>
> The master (running-as-root) httpd is segfaulting after a few minutes, but
> the children are staying around (for all the good that does ;) :
>
> wait4(-1, 0xbffff620, WNOHANG|WUNTRACED, NULL) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> wait4(-1, 0xbffff620, WNOHANG|WUNTRACED, NULL) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> wait4(-1, 0xbffff620, WNOHANG|WUNTRACED, NULL) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> fork() = 29289
> wait4(-1, 0xbffff620, WNOHANG|WUNTRACED, NULL) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> --- SIGSEGV (Segmentation fault) ---
You don't by any chance have NEED_WAITPID defined somewhere, do you?
I found a place that looks like a bug that could definitely cause
this, but only if you have NEED_WAITPID defined. Take a look
at server/mpm_common.c around line 233 (search for the call to
reap_children()). The return value is overwriting the pointer to the
apr_proc_t struct, and then right after we call apr_sleep() (which lines
up with the call to select() in your trace above) we get a segfault.
Could you send me your httpd binary and the core file?
In the mean time, if it turns out you do have NEED_WAITPID defined,
try this patch:
Index: server/mpm_common.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/mpm_common.c,v
retrieving revision 1.102.2.4
diff -u -u -r1.102.2.4 mpm_common.c
--- server/mpm_common.c 15 May 2003 20:28:18 -0000 1.102.2.4
+++ server/mpm_common.c 15 Oct 2003 20:27:53 -0000
@@ -230,7 +230,7 @@
}
#ifdef NEED_WAITPID
- if ((ret = reap_children(exitcode, status)) > 0) {
+ if ((rv = reap_children(exitcode, status)) > 0) {
return;
}
#endif
-aaron