Justin Erenkrantz wrote:
>
> Anybody have any thoughts on this? I haven't received any responses.
> This bug is a big PITA on Solaris...
>
ok, I'm starting to catch up with you on this. Thanks a bunch for all
the digging you did :-)
> I think this patch might work (haven't tested it), but I'm not really
> sure.
I will test this...it looks quite reasonable after reading your analysis
and some of the 1.3 code. Unfortunately I don't have a Solaris 2.7
account, but if it fixes Solaris 2.6 and 8, it ought to make the 2.7
customer happy.
> Someone who knows the otherchild code would be able to tell
> for sure. -- justin
1.3 otherchild didn't look too bad when I plowed thru it earlier.
However, it seems like there was a misunderstanding of when
to use the various OC_REASON_blah's.
>
> Index: http_main.c
> ===================================================================
> RCS file: /home/cvspublic/apache-1.3/src/main/http_main.c,v
> retrieving revision 1.535
> diff -u -r1.535 http_main.c
> --- http_main.c 2001/04/12 17:49:26 1.535
> +++ http_main.c 2001/06/01 23:50:35
> @@ -2492,7 +2492,7 @@
> waitret = waitpid(ocr->pid, &status, WNOHANG);
> if (waitret == ocr->pid) {
> ocr->pid = -1;
> - (*ocr->maintenance) (OC_REASON_DEATH, ocr->data, (ap_wait_t)status);
> + (*ocr->maintenance) (OC_REASON_RESTART, ocr->data,
>(ap_wait_t)status);
> }
> else if (waitret == 0) {
> (*ocr->maintenance) (OC_REASON_RESTART, ocr->data, (ap_wait_t)-1);
>
> >
> > The symptoms that I see on Solaris are thus:
> >
> > 1) The rotatelogs process for ErrorLog directive loses its parent
> > after startup. Hence, it's ppid is init. I'm not sure how this
> > plays in, but this doesn't look right. I also can't recreate 2
> > without having ErrorLog be a piped log.
> >
that seems bogus all right, but I think your fix for 2) will cure our
immediate pain.
> > 2) On shutdown (via SIGTERM), a race condition occurs.
> > reclaim_child_processes kills all of the httpd children (fine),
> > but the second half of the r_c_p call is a bit odd. Basically,
> > it calls piped_log_maint with OC_REASON_DEATH - this triggers
> > piped_log_spawn to start up a new child since pl->program isn't
> > NULL. This is completely and utterly wrong. We're supposed to
> > be shutting down, not starting up. These new children never
> > receive the SIGTERM - they will stick around until another
> > SIGTERM occurs. The old children have already quit due to the
> > SIGTERM, but the piped_log_spawn starts up new rotatelogs
> > processes.
> >