On Saturday 03 October 2009 12:44:02 pm Jeff Trawick wrote:
> On Thu, Oct 1, 2009 at 8:01 PM, Jeff Trawick <[email protected]> wrote:
> > (just fixing subject)
> >
> > On Thu, Oct 1, 2009 at 7:55 PM, Ricardo Cantu <[email protected]>wrote:
> >> On Tuesday 29 September 2009 4:20:49 pm you wrote:
> >> > On Tue, Sep 29, 2009 at 4:59 PM, Ricardo Cantu <[email protected]>
> >>
> >> wrote:
> >> > > On Tuesday 29 September 2009 2:31:21 pm Jeff Trawick wrote:
> >> > > > ZombieScanInterval (leave alone until processes can be reaped
> >> > >
> >> > > differently)
> >> > > Working on a patch for this one. Don't want to duplicate work, so
> >> > > let
> >>
> >> me
> >>
> >> > > know
> >> > > if anybody else is working on this.
> >> >
> >> > not me
> >> >
> >> > I hope that, for Unix, processes can be reaped as with the MPMs:
> >> > instead
> >>
> >> of
> >>
> >> > asking if a specific pid has exited (for each pid in the list), ask if
> >>
> >> any
> >>
> >> > pid has exited and if so find it in the list and handle.
> >>
> >> Well, here it is. My patch to reap the children when they exit rather
> >> than check the list for zombies. Before I take out the old logic for the
> >> zombie scan I would like to hear some input on the code.
> >>
> >> basically,
> >>
> >> apr_proc_other_child_register() - to register a callback when child
> >> exits.
> >>
> >> sigaction(SIGCHLD, &sa, NULL) - to listen for children dying.
> >>
> >> apr_proc_other_child_refresh_all(APR_OC_REASON_RESTART) - called when
> >> SIGCHLD
> >> received so callback will be called on the correct registered child.
> >>
> >> fcgid_child_maint - The callback. Cleans up the various lists and prints
> >> out
> >> log info.
> >>
> >> And that's it.
>
> Is this right?
>
> before:
>
> when process mgr has awakened (every second) and ZombieScanInterval has
> elapsed:
>
> lock table
> foreach table entry (process spawned by process mgr) {
> call waitpid on pid and see if it has exited
> if exited, move to free list
> }
> unlock table
>
> and this was done whether or not a child exited
>
> now:
>
> when awakened by SIGCHILD, in signal handler:
If one process exits every minute and the proc array is 40 processes long it
would go from ~60 x 40 scans to 1 x40. Seems like an improvement.
>
> foreach registered other child (process spawned by process mgr) {
> call waitpid on pid and see if it has exited
> if exited {
I guess I could do a wait_all_procs here instead.
> lock table
> search table for child and move to free list
This can never be eliminated due to the proc's being in a simple circular
list. We could change it to a double linked and not have to search.
> unlock table
> }
> }
>
> If so: I don't see that using the other child API has bought us much. At
> least we don't lock the table unless a process has actually exited.
>
> If the process mgr calls waitpid with a wildcard to see if any pid has
> exited, then at least a list doesn't have to be searched unless there is
> work to do.
true with this patch as well.
> The prefork MPM does that, then calls
> apr_proc_other_child_alert() to see if it is for an other-child. (The
> alert function then runs the list of other children to find the pid.)
>
> SIGCHLD can be used to unblock from waiting for (command, 1 second)
> sooner*, but calling interesting code from the SIGCHLD handler should be
> avoided if at all possible. (Windows wouldn't have that flow of control
> either.)
concede on that point.
>
> *I dunno if the mpm_common function currently used will return on EINTR
it does.
>
> Thoughts?
Okay, instead of using SIGCHLD to unblock the pm, do you propose making it a
nowait type read and do the wait_all_procs every cycle or perhaps a reaping
thread to handle the children and leave the pm alone? From what I can tell
prefork.c does not block (ap_wait_or_timeout uses ap_wait_all_procs(ret,
status, APR_NOWAIT, p); which does not block) unlike pm_main that blocks at
procmgr_peek_cmd waiting for a request.
recap of directions:
1. make pm_main non-blocking and do wait_all_procs/apr_proc_other_child_alert
per cycle.
or
2. Create a thread to do wait_all_procs/apr_proc_other_child_alert and leave
pm_main alone.
or
3. Change my patch to use wait_all_procs/apr_proc_other_child_alert on
SIGCHLD.
or
4. unblock pm_main some how on SIGCHLD and make wait_all_procs part of a
normal cycle of a request.
>
> (BTW, it is easier on reviewers if one patch tries to solve exactly one
> problem.)
got it