Hi -- Someone tried to send me a fax in the middle of the night, so I've been up for a while and I think I've realized there are several subtle contention issues involved with any fix for this issue.
First of all, I should note that my initial patch in Bugzilla has a flaw; it needs an else clause around most of the logic in perform_idle_server_maintenance(). With that in place, it works pretty well for me. However, there's a theoretical race condition that I'll describe below. To summarize where we stand, we're trying to avoid the situation where the parent process creates a worker MPM child, and then before the child gets a chance to run very far, the parent's perform_idle_server_maintenance() routine (hereafter p_i_s_m(), per Greg Ames) checks and sees no/few running worker threads for that child and spawns another child. Jeff Trawick suggested a process_score.mpm_state field used much the way my patch uses it. The parent sets it to SERVER_STARTING after creating the child in make_child(), and the child sets it to SERVER_READY when some workers have been created. Then p_i_s_m() can check it and, if not READY, bypass the count of workers and assume "they'll be ready soon". Greg Ames preferred the other approach I initially mentioned; using the worker_score.status field only and adding a new status value; let's call it SERVER_INIT. In this case, either the parent or the child would perform this logic shortly after the child was created: for (i = 0; i < ap_threads_per_child; i++) { if (status != SERVER_GRACEFUL && status != SERVER_DEAD) { ap_update_child_status_from_indexes(slot, i, SERVER_INIT, NULL); } } Then p_i_s_m() would accept this worker state as meaning "I'll be ready soon". Sticking with the latter approach, I'll try to explain the contention issue I see. Fundamentally, it comes about because normally only the parent changes process_score fields, while only the child changes worker_score fields (synchronizing between its threads and sometimes checking for an GRACEFUL or DEAD status from another child's threads). Suppose the child runs the SERVER_INIT setup code above in child_main(), before ap_run_child_init(). It seems to me that it's at least theoretically possible that the parent's p_i_s_m() still gets scheduled first by the OS and concludes that another child needs to be created. Suppose instead that the parent does the setup code in make_child(), right after successfully forking. Then p_i_s_m() can't see un-INIT worker status values. However it's conceivable that right after make_child() does its check for != GRACEFUL and != DEAD that the child's threads get scheduled and set the worker status to READY; this then gets clobbered forever by the parent. One option might be to use apr_atomic_cas32() routines on the worker status fields -- perhaps only required in make_child(), even. That might be viable, but I confess I can't see all the ramifications right now, and those status fields are updated in lots of places via ap_update_child_status_from_indexes(). Let's call this option A. Reviewing the former approach, the same kinds of contention exist because make_child() sets mpm_state to STARTING and then child's start_threads() sets it to READY; if scheduling causes them to be reversed, the child will always be treated as having a full complement of "about to be ready" workers. It might be straightforward to use apr_atomic_cas32() here since we're the only users of the field. Let's call this option B. Still, based on scoreboard.h, I think it makes sense if the parent is the only one to edit process_score fields: /* stuff which the parent generally writes and the children rarely read */ typedef struct process_score process_score; Option C is somewhat more complex, but perhaps handles a number of different issues. Both a process_score.mpm_state and a worker_score.internal field are added. The child process maintains two additional worker_score structures at ap_threads_per_child and ap_threads_per_child+1, these are marked internal and correspond to the listener and start threads. They just set their status to STARTING, READY, GRACEFUL, and DEAD as appropriate. No other changes to the child process are needed, I believe. The parent process sets mpm_state to STARTING in make_child(), and follows the logic of option B, except that atomics aren't needed. That's because in p_i_s_m() it checks the child's start thread's internal worker_score structure, and if it is READY, then it flips the mpm_state to READY. Otherwise, if mpm_state is STARTING, it does the bypass logic and assumes that all workers "will be ready soon". This seems to me to have the advantage that only the parent edits process_score and only the child edits worker_score, which is conceptually clearer and reduces the ways contention issues can creep in. It does introduce some minor changes to make sure there's room in the scoreboard for the extra two internal threads and to make mod_status not see them, perhaps by checking the internal flag in ap_get_scoreboard_worker(). It might also lead the way to dealing with this comment about how to list non-worker threads in the scoreboard: /* What state should this child_main process be listed as in the * scoreboard...? ... * This state should be listed separately in the scoreboard, in some kind * of process_status, not mixed in with the worker threads' status. * "life_status" is almost right, but it's in the worker's structure, and * the name could be clearer. gla */ I'm going to chew over these options this weekend and hopefully have a patchset next week. Advice welcome! Chris. -- GPG Key ID: 366A375B GPG Key Fingerprint: 485E 5041 17E1 E2BB C263 E4DE C8E3 FA36 366A 375B