Aaron Bannert wrote:
Ok, after wading through the code for awhile I have a working theory:

1) Parent creats a child
2) Parent gets graceful-restart signal
3) Parent returns from ap_run_mpm, pconf is cleared, cross-process lock file
   is closed and removed.
4) Child finally gets scheduled to run the apr_proc_mutex_child_init for
   fcntl(). Oops, apr_file_open fails since step #3 above removed the file.
   Child errors out (ENOENT is returned from apr_file_open()) and dies.
5) Parent notices that child has died, errors out and dies completely.

sounds very possible


hopefully it is sane if parent doesn't exit out if a prior generation child reports APEXIT_CHILDFATAL; but it looks like prefork checks for APEXIT_CHILDFATAL before checking if it is a current-generation child

In any case, can anyone else confirm that this race condition exists, and
maybe suggest a way to synchronize a parent's shutdown with the starting
up of an old-generation child? (Eg. the parent shouldn't remove the
lockfile until all children are successfully started.)

it shouldn't be bad to remove the lockfile when it is done now, and certainly that new child of old generation should exit ASAP anyway since it has old config; I suspect if parent ignores "fatal" exits of such children we'd be okay


no guesses from me on whether this race condition is what causes the problem



Reply via email to