Hi --

   Someone tried to send me a fax in the middle of the night,
so I've been up for a while and I think I've realized there are
several subtle contention issues involved with any fix for this issue.

   First of all, I should note that my initial patch in Bugzilla
has a flaw; it needs an else clause around most of the logic in
perform_idle_server_maintenance().  With that in place, it works
pretty well for me.  However, there's a theoretical race condition
that I'll describe below.


   To summarize where we stand, we're trying to avoid the situation
where the parent process creates a worker MPM child, and then before
the child gets a chance to run very far, the parent's
perform_idle_server_maintenance() routine (hereafter p_i_s_m(),
per Greg Ames) checks and sees no/few running worker threads
for that child and spawns another child.

   Jeff Trawick suggested a process_score.mpm_state field used
much the way my patch uses it.  The parent sets it to SERVER_STARTING
after creating the child in make_child(), and the child sets it
to SERVER_READY when some workers have been created.  Then
p_i_s_m() can check it and, if not READY, bypass the count of
workers and assume "they'll be ready soon".

   Greg Ames preferred the other approach I initially mentioned;
using the worker_score.status field only and adding a new
status value; let's call it SERVER_INIT.  In this case,
either the parent or the child would perform this logic shortly
after the child was created:

for (i = 0; i < ap_threads_per_child; i++) {
    if (status != SERVER_GRACEFUL && status != SERVER_DEAD) {
        ap_update_child_status_from_indexes(slot, i, SERVER_INIT, NULL);
    }
}

Then p_i_s_m() would accept this worker state as meaning "I'll
be ready soon".


   Sticking with the latter approach, I'll try to explain the
contention issue I see.  Fundamentally, it comes about because
normally only the parent changes process_score fields, while only
the child changes worker_score fields (synchronizing between
its threads and sometimes checking for an GRACEFUL or DEAD
status from another child's threads).

   Suppose the child runs the SERVER_INIT setup code above
in child_main(), before ap_run_child_init().  It seems to me
that it's at least theoretically possible that the parent's
p_i_s_m() still gets scheduled first by the OS and concludes
that another child needs to be created.

   Suppose instead that the parent does the setup code in
make_child(), right after successfully forking.  Then p_i_s_m()
can't see un-INIT worker status values.  However it's conceivable
that right after make_child() does its check for != GRACEFUL and
!= DEAD that the child's threads get scheduled and set the worker
status to READY; this then gets clobbered forever by the parent.

   One option might be to use apr_atomic_cas32() routines on the
worker status fields -- perhaps only required in make_child(), even.
That might be viable, but I confess I can't see all the
ramifications right now, and those status fields are updated
in lots of places via ap_update_child_status_from_indexes().
Let's call this option A.


   Reviewing the former approach, the same kinds of contention
exist because make_child() sets mpm_state to STARTING and
then child's start_threads() sets it to READY; if scheduling
causes them to be reversed, the child will always be treated
as having a full complement of "about to be ready" workers.

   It might be straightforward to use apr_atomic_cas32() here
since we're the only users of the field.  Let's call this
option B.  Still, based on scoreboard.h, I think it makes sense
if the parent is the only one to edit process_score fields:

/* stuff which the parent generally writes and the children rarely read */
typedef struct process_score process_score;


   Option C is somewhat more complex, but perhaps handles a
number of different issues.  Both a process_score.mpm_state
and a worker_score.internal field are added.

   The child process maintains two additional worker_score
structures at ap_threads_per_child and ap_threads_per_child+1,
these are marked internal and correspond to the listener
and start threads.  They just set their status to STARTING,
READY, GRACEFUL, and DEAD as appropriate.  No other changes
to the child process are needed, I believe.

   The parent process sets mpm_state to STARTING in make_child(),
and follows the logic of option B, except that atomics aren't
needed.  That's because in p_i_s_m() it checks the child's
start thread's internal worker_score structure, and if it is READY,
then it flips the mpm_state to READY.  Otherwise, if mpm_state
is STARTING, it does the bypass logic and assumes that all
workers "will be ready soon".

   This seems to me to have the advantage that only the parent
edits process_score and only the child edits worker_score, which
is conceptually clearer and reduces the ways contention issues
can creep in.  It does introduce some minor changes to make
sure there's room in the scoreboard for the extra two internal
threads and to make mod_status not see them, perhaps by
checking the internal flag in ap_get_scoreboard_worker().
It might also lead the way to dealing with this comment about
how to list non-worker threads in the scoreboard:

/* What state should this child_main process be listed as in the
 * scoreboard...?
...
 *  This state should be listed separately in the scoreboard, in some kind
 *  of process_status, not mixed in with the worker threads' status.
 *  "life_status" is almost right, but it's in the worker's structure, and
 *  the name could be clearer.   gla
 */

   I'm going to chew over these options this weekend and
hopefully have a patchset next week.  Advice welcome!

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Reply via email to