Brian Pane wrote: > > I'm seeing a race condition in which the worker MPM logs the > "long lost child came home!" warning message. The test case > is: > - run "ab -c5" to create a steady load on the httpd > - while it's running, do a graceful restart. > This will sometimes yield the "long lost child" message. > > I added a bit of diagnostic logging and found that the order > of events looks like this: > - child process for scoreboard slot X finishes its work and exits > - parent process forks a new child process and assigns it > to scoreboard slot X > - parent process notices that the first child process has > exited, looks for its pid in scoreboard, and doesn't find it > > > Is this a harmless (and expected) warning case, or cause for > alarm?
I think it's harmless. If there are no scoreboard slots available that are completely empty, one new process is allowed to "squat" on a scoreboard slot for a process that is quiescing and has some unused thread slots. Look at how perform_idle_server_maintenance manipulates the free_slots array, starting around line 1417 in worker.c. Assuming the "squatting" scenario happens, the new process could overwrite the pid field(s) in the scoreboard. Then when the SIGCHILD logic in the parent kicks in, it may not be able to find the dying process's pid in the scoreboard. Bumping up ServerLimit to be significantly bigger than the number of processes allowed by MaxClients will reduce the likelihood of scoreboard squatting. Could we do something to make the messages go away? This is software, so the only safe answer is "maybe". I don't remember enough of the details of how this all works to say if it would be easy. Greg Greg
