Prefork MPM and mod_watchdog

Rainer Jung Sun, 27 Jan 2019 05:40:59 -0800

Hi all,

since around 2.4.26 I notice occasional hangs during release testingonly on Solaris 10 Sparc when using MPM prefork.

Those hangs were always observed during proxy testing. The situation wasin most cases (all?), that only one running httpd process was left, soit would accept the incoming request but could not proxy them and eachrequest ran into the proxy timeout (I think 1 minute in the test). Theprocess was in fact an httpd child process. Parent process and otherchildren were gone.

I now narrowed it down some more. I am not sure, this is the onlysituation, but at least this is one I do observe most often:


- no problems with event or worker MPM

- no problems on REHL and SLES (Linux)

- even on Solaris, running without mod_watchdog (and therefore withoutmod_heartbeat, mod_heartmonitor, mod_lbmethod_heartbeat andmod_proxy_hcheck) it seems the hang does not occur.

- as soon as I enable mod_watchdog only (but not the above modules thatwould use it), the hangs start to happen every now and then.


The pattern seems to be (from the error log):

[Sat Jan 26 21:36:01.855207 2019] [core:trace4] [pid 7850]mpm_common.c(536): mpm child 7853 (gen 0/slot 0) started[Sat Jan 26 21:36:01.875826 2019] [mpm_prefork:notice] [pid 7850]AH00163: Apache/2.4.38 (Unix) OpenSSL/1.0.2 configured -- resumingnormal operations[Sat Jan 26 21:36:01.875975 2019] [mpm_prefork:info] [pid 7850] AH00164:Server built: Jan 18 2019 17:54:50[Sat Jan 26 21:36:01.876208 2019] [core:notice] [pid 7850] AH00094:Command line: '/path/to/bin/httpd -d /path/to/t -f/path/to/t/conf/httpd.conf -D APACHE2 -D APACHE2_4'[Sat Jan 26 21:36:01.876340 2019] [core:debug] [pid 7850] log.c(1571):AH02639: Using SO_REUSEPORT: no (1)[Sat Jan 26 21:36:01.876417 2019] [mpm_prefork:debug] [pid 7850]prefork.c(923): AH00165: Accept mutex: pthread (default: pthread)[Sat Jan 26 21:36:01.875753 2019] [core:trace4] [pid 7850]mpm_common.c(536): mpm child 7854 (gen 0/slot 1) started

...

[Sat Jan 26 21:36:18.969742 2019] [core:trace4] [pid 7850]mpm_common.c(536): mpm child 7889 (gen 0/slot 2) started

...
SHOULD BE FINE UNTIL HERE
...

[Sat Jan 26 21:38:29.016961 2019] [mpm_prefork:error] [pid 7850]AH00161: server reached MaxRequestWorkers setting, consider raising theMaxRequestWorkers setting

...
UNCLEAR, WHETHER THAT IS UNEXPECTED, BUT THEN THE REAL ERROR
...

[Sat Jan 26 21:42:48.213005 2019] [mpm_prefork:emerg] [pid 7889](45)Deadlock situation detected/avoided: AH00144: couldn't grab theaccept mutex[Sat Jan 26 21:42:48.490884 2019] [mpm_prefork:emerg] [pid 7853](45)Deadlock situation detected/avoided: AH00144: couldn't grab theaccept mutex

...

[Sat Jan 26 21:42:49.129401 2019] [core:alert] [pid 7850] AH00050: Child7853 returned a Fatal error... Apache is exiting![Sat Jan 26 21:42:49.129519 2019] [:emerg] [pid 7850] AH02818: MPM runfailed, exiting


bin/httpd -V
Server version: Apache/2.4.38 (Unix)
Server built:   Jan 18 2019 17:54:50
Server's Module Magic Number: 20120211:83
Server loaded:  APR 1.6.5, APR-UTIL 1.6.1
Compiled using: APR 1.6.5, APR-UTIL 1.6.1
Architecture:   32-bit
Server MPM:     prefork
  threaded:     no
    forked:     yes (variable process count)
Server compiled with....
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_PROC_PTHREAD_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D DYNAMIC_MODULE_LIMIT=256
 -D HTTPD_ROOT="..."
 -D SUEXEC_BIN=".../bin/suexec"
 -D DEFAULT_PIDLOG="logs/httpd.pid"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"

I remember Solaris having a very bad deadlock detection but that goesback to using fcntl locks. Here we are using pthread locks. Since thisis APR 1.6, my old observation about pthread timed lock problems onSolaris 10 also does not apply here.


I wonder what is special about prefork and mod_watchdog.

In addition, what I find strange is, that the accept mutex error lead tochild crashes, but not to a start of new child processes. Instead one ofthe two child crashes (the second one) seems to be noticed by the parent:

[core:alert] [pid 7850] AH00050: Child 7853 returned a Fatal error...Apache is exiting!

[:emerg] [pid 7850] AH02818: MPM run failed, exiting

and in fact the parent goes away but the third and last child processstays and continues to handle requests. That doesn't seem right.

I also see complains about reaching MaxRequestWorkers in the preforktests on Solaris. I haven't verified it, but it seems they run with afixed limit of three child processes. On the one hand, I would onlyexpect two of them to be actually used at any time (request plus proxiedrequest), since the tests are not executed in parallel. So the messageis a bit strange. On the other hand the limit three (if I am right)seems very limited.

Any hints, ideas, comments about prefork plus mod_watchdog, theincomplete shutdown after AH00050 plus AH02818, the limit of 3 childprocesses and the MaxRequestWorkers warning?


Questions, questions, ... ;)

Regards,

Rainer

Prefork MPM and mod_watchdog

Reply via email to