Re: [mod_fcgid patch] reap children without the zombie scan

Ricardo Cantu Sun, 04 Oct 2009 16:56:59 -0700

On Saturday 03 October 2009 12:44:02 pm Jeff Trawick wrote:
> On Thu, Oct 1, 2009 at 8:01 PM, Jeff Trawick <[email protected]> wrote:
> > (just fixing subject)
> >
> > On Thu, Oct 1, 2009 at 7:55 PM, Ricardo Cantu <[email protected]>wrote:
> >> On Tuesday 29 September 2009 4:20:49 pm you wrote:
> >> > On Tue, Sep 29, 2009 at 4:59 PM, Ricardo Cantu <[email protected]>
> >>
> >> wrote:
> >> > > On Tuesday 29 September 2009 2:31:21 pm Jeff Trawick wrote:
> >> > > > ZombieScanInterval (leave alone until processes can be reaped
> >> > >
> >> > > differently)
> >> > > Working on a patch for this one. Don't want to duplicate work, so
> >> > > let
> >>
> >> me
> >>
> >> > > know
> >> > > if anybody else is working on this.
> >> >
> >> > not me
> >> >
> >> > I hope that, for Unix, processes can be reaped as with the MPMs:
> >> > instead
> >>
> >> of
> >>
> >> > asking if a specific pid has exited (for each pid in the list), ask if
> >>
> >> any
> >>
> >> > pid has exited and if so find it in the list and handle.
> >>
> >> Well, here it is. My patch to reap the children when they exit rather
> >> than check the list for zombies. Before I take out the old logic for the
> >> zombie scan I would like to hear some input on the code.
> >>
> >> basically,
> >>
> >> apr_proc_other_child_register()  - to register a callback when child
> >> exits.
> >>
> >> sigaction(SIGCHLD, &sa, NULL) - to listen for children dying.
> >>
> >> apr_proc_other_child_refresh_all(APR_OC_REASON_RESTART) - called when
> >> SIGCHLD
> >> received so callback will be called on the correct registered child.
> >>
> >> fcgid_child_maint - The callback. Cleans up the various lists and prints
> >> out
> >> log info.
> >>
> >> And that's it.
> 
> Is this right?
> 
> before:
> 
> when process mgr has awakened (every second) and ZombieScanInterval has
> elapsed:
> 
> lock table
> foreach table entry (process spawned by process mgr) {
>     call waitpid on pid and see if it has exited
>     if exited, move to free list
> }
> unlock table
> 
> and this was done whether or not a child exited
> 
> now:
> 
> when awakened by SIGCHILD, in signal handler:
If one process exits every minute and the proc array is 40 processes long it 
would go from ~60 x 40 scans to 1 x40. Seems like an improvement.


> 
> foreach registered other child (process spawned by process mgr) {
>     call waitpid on pid and see if it has exited
>     if exited {
I guess I could do a  wait_all_procs here instead.

>         lock table

>         search table for child and move to free list
This can never be eliminated due to the proc's being in a simple circular 
list. We could change it to a double linked and not have to search.

>         unlock table
>     }
> }
> 
> If so:  I don't see that using the other child API has bought us much.  At
> least we don't lock the table unless a process has actually exited.
> 
> If the process mgr calls waitpid with a wildcard to see if any pid has
> exited, then at least a list doesn't have to be searched unless there is
> work to do.
true with this patch as well.

> The prefork MPM does that, then calls
> apr_proc_other_child_alert() to see if it is for an other-child.  (The
>  alert function then runs the list of other children to find the pid.)
> 
> SIGCHLD can be used to unblock from waiting for (command, 1 second)
>  sooner*, but calling interesting code from the SIGCHLD handler should be
>  avoided if at all possible.  (Windows wouldn't have that flow of control
>  either.)
concede on that point.
 
> 
> *I dunno if the mpm_common function currently used will return on EINTR
it does.

> 
> Thoughts?
Okay, instead of using SIGCHLD to unblock the pm, do you propose making it a 
nowait type read and do the wait_all_procs every cycle or perhaps a reaping 
thread to handle the children and leave the pm alone? From what I can tell 
prefork.c does not block (ap_wait_or_timeout uses ap_wait_all_procs(ret, 
status, APR_NOWAIT, p); which does not block) unlike pm_main that blocks at 
procmgr_peek_cmd waiting for a request. 

recap of directions:
1. make pm_main non-blocking and do wait_all_procs/apr_proc_other_child_alert 
per cycle.

or 

2. Create a thread to do wait_all_procs/apr_proc_other_child_alert and leave 
pm_main alone.

or

3. Change my patch to use wait_all_procs/apr_proc_other_child_alert on 
SIGCHLD.

or

4. unblock pm_main some how on SIGCHLD and make wait_all_procs part of a 
normal cycle of a request.

> 
> (BTW, it is easier on reviewers if one patch tries to solve exactly one
> problem.)
got it

Re: [mod_fcgid patch] reap children without the zombie scan

Reply via email to