*> another cause can be that you are using an external system to trigger log file rotation *
That is an inspired guess! I am in fact using "cronolog" to mange logfile rotation. I don't *think* it's the culprit, since it rotates only monthly and crashes can happen many times a day (sometimes minutes apart). But removing it can at least narrow the variables down. > requests that are getting stuck, and eventually you hit that socket timeout It appears the crashes are pretty much immediate, within a second of a request (or multiple requests) coming in. The server may run 10 hours before the next crash, or it may crash again within 5 minutes. There does seem to be some correlation between request load and crashing likelihood - crashes almost always happen overnight when multiple cron jobs are hitting the program at once. Meanwhile I'll stuff "WSGIRestrictEmbedded On" into the main httpd.conf and see what happens. I appreciate the debugging advice. On Monday, August 31, 2020 at 5:16:48 PM UTC-4 Graham Dumpleton wrote: > > On 1 Sep 2020, at 7:06 am, David White <[email protected]> wrote: > > Unfortunately the main Apache log shows nothing except normal > startup/shutdown messages. If the "sgm-prod" threads are being terminated > and restarted, they are leaving no indication of that in the main server > logs. > > The only clue appears to be that the crashing occurs during more heavy > loads. The application does not often strain the CPU/RAM of the underlying > host, but the Apache vhost logs show the crashes seem to occur when > multiple requests are being handled nearly simultaneously. > > [Mon Aug 31 11:11:07.504787 2020] [wsgi:error] [pid 35980:tid > 140546546816768] [client 10.83.210.200:47266] Truncated or oversized > response headers received from daemon process 'sgm-prod': > /apps/www/sgm/wsgi/SGM/run.py > [Mon Aug 31 11:11:07.504806 2020] [wsgi:error] [pid 35980:tid > 140546571994880] [client 10.31.82.142:35810] Truncated or oversized > response headers received from daemon process 'sgm-prod': > /apps/www/sgm/wsgi/SGM/run.py > > Am I likely to see any benefit to either: > > 1. Reducing the process or thread count passed to WSGIDaemonProcess, or > 2. Putting Apache into "prefork" MPM? > > Never use prefork MPM if you can avoid it. > > Also ensure that outside of VirtualHost you have: > > WSGIRestrictEmbedded On > > to ensure you are never accidentally running in the main Apache child > worker processes. > > If there is no evidence of a crash, then another cause can be that you are > using an external system to trigger log file rotation, rather than > recommended method of using Apache's own log file rotation method. > > An external log file rotation system usually only fires once per day, but > maybe yours is set up differently based on size of logs. > > The problem with an external log file rotation system is that it signals > Apache to restart. Although the main Apache child worker process are given > a grace period to finish requests, the way Apache libraries implement > management of third party processes such as the mod_wsgi daemon processes > is that it will kill them after 5 seconds if they don't shutdown quick > enough. This results in requests being proxied by Apache child worker > processes being chopped off and you see that message. > > If it is this though, you should see clear messages at info level in logs > from mod_wsgi that the daemon processes were being shutdown and restarted. > This will appear some time before you see that message. > > Only other thing can think of is that you have requests that are getting > stuck, and eventually you hit that socket timeout, although you have that > set very large, so would take a request to be stuck for 6000 seconds before > you saw it. > > Only way to tell if may be that is to use: > > > https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces > > so can signal mod_wsgi to dump Python stack traces periodically and see if > a specific request is stuck some where. > > Graham > > Thanks, Graham. > > On Monday, August 31, 2020 at 4:56:46 PM UTC-4 Graham Dumpleton wrote: > >> Look in the main Apache error log, not the virtual host, and you will >> probably find a message about a segmentation fault or other error. >> >> Where it is quite random like this, unless you can enable capture of the >> core dump file and can run a debugger (gdb) on that to get a stack trace, >> is going to be hard to track it down. >> >> Only other thing can suggest is watching the process size to see if >> getting so large that you are running out of memory and a failure to >> allocate memory causes something to crash. >> >> Graham >> >> On 1 Sep 2020, at 3:28 am, David White <[email protected]> wrote: >> >> >> Hello. I am running Apache 2.4.43 with mod_wsgi 4.71 compiled against >> Python 3.8.3 (all manually compiled, not part of the RHEL 8.1 distro). >> Apache MPM is "event". >> >> The application running in one of my virtual hosts will occasionally >> crash, but randomly. Repeating the same request immediately will usually >> succeed. The application may continue working for hours before randomly >> crashing again. >> >> This started happening recently with an upgrade of the Flask and >> SQLAlchemy modules. (again, all manually installed via pip) >> >> A crash is reported this way in the vhost's error_log: >> >> [Mon Aug 31 11:11:07.504787 2020] [wsgi:error] [pid 35980:tid >> 140546546816768] [client 10.83.210.200:47266] Truncated or oversized >> response headers received from daemon process 'sgm-prod': >> /apps/www/sgm/wsgi/SGM/run.py >> [Mon Aug 31 11:11:07.504806 2020] [wsgi:error] [pid 35980:tid >> 140546571994880] [client 10.31.82.142:35810] Truncated or oversized >> response headers received from daemon process 'sgm-prod': >> /apps/www/sgm/wsgi/SGM/run.py >> [Mon Aug 31 11:11:08.512262 2020] [wsgi:info] [pid 40601:tid >> 140547110185856] mod_wsgi (pid=40601): Attach interpreter ''. >> [Mon Aug 31 11:11:08.514415 2020] [wsgi:info] [pid 40601:tid >> 140547110185856] mod_wsgi (pid=40601): Adding '/apps/www/sgm/wsgi/SGM' to >> path. >> [Mon Aug 31 11:11:08.514526 2020] [wsgi:info] [pid 40601:tid >> 140547110185856] mod_wsgi (pid=40601): Adding >> '/apps/vmscan/.virtualenvs/sgm/lib/python3.8/site-packages' to path. >> [Mon Aug 31 11:11:08.515520 2020] [wsgi:debug] [pid 40601:tid >> 140546715096832] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 0 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515575 2020] [wsgi:debug] [pid 40601:tid >> 140546706704128] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 1 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515629 2020] [wsgi:debug] [pid 40601:tid >> 140546698311424] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 2 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515702 2020] [wsgi:debug] [pid 40601:tid >> 140546689918720] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 3 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515737 2020] [wsgi:debug] [pid 40601:tid >> 140546681526016] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 4 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515766 2020] [wsgi:debug] [pid 40601:tid >> 140546673133312] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 5 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515795 2020] [wsgi:debug] [pid 40601:tid >> 140546664740608] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 6 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515821 2020] [wsgi:debug] [pid 40601:tid >> 140546656347904] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 7 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515853 2020] [wsgi:debug] [pid 40601:tid >> 140546647955200] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 8 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515921 2020] [wsgi:debug] [pid 40601:tid >> 140546639562496] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 9 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515948 2020] [wsgi:debug] [pid 40601:tid >> 140546631169792] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 10 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.515972 2020] [wsgi:debug] [pid 40601:tid >> 140546622777088] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 11 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.516002 2020] [wsgi:debug] [pid 40601:tid >> 140546614384384] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 12 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.516036 2020] [wsgi:debug] [pid 40601:tid >> 140546605991680] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 13 in daemon process 'sgm-prod'. >> [Mon Aug 31 11:11:08.516061 2020] [wsgi:debug] [pid 40601:tid >> 140546597598976] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started >> thread 14 in daemon process 'sgm-prod'. >> >> The WSGI portion of the configuration for the vhost in Apache looks like >> this: >> >> WSGIPassAuthorization On >> LogLevel info wsgi:trace6 >> SetEnv SGM_PRODUCTION 1 >> SetEnv SGM_USE_ORACLE 1 >> WSGIDaemonProcess sgm-prod user=sgmuser group=sgmgroup threads=15 >> python-home=/apps/sgmuser/.virtualenvs/sgm >> python-path=/apps/www/sgm/wsgi/SGM:/apps/sgmuser/.virtualenvs/sgm/lib/python3.8/site-packages >> >> socket-timeout=6000 >> WSGIScriptAlias / /apps/www/sgm/wsgi/SGM/run.py >> <Directory /apps/www/sgm/wsgi/SGM> >> Require all granted >> AllowOverride AuthConfig >> WSGIProcessGroup sgm-prod >> WSGIApplicationGroup %{GLOBAL} >> </Directory> >> >> The main Apache error log shows nothing relevant, and the >> application-specific logs (with Python logging messages) show only that the >> request was made. There is no exception traceback data being logged (I'm >> guessing the app is crashing before that can happen.) >> >> Given how transitory this error is, I'm not sure how else to configure >> Apache or the SGM app itself to get more details about why the "sgm-prod" >> process seems to be crashing. >> >> Any help would be appreciated! Thanks. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/modwsgi/8df00cf3-abeb-43a8-b14a-6666af322a9an%40googlegroups.com >> >> <https://groups.google.com/d/msgid/modwsgi/8df00cf3-abeb-43a8-b14a-6666af322a9an%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/modwsgi/c5769203-50ef-4501-a3ac-ada2ddc6caaen%40googlegroups.com > > <https://groups.google.com/d/msgid/modwsgi/c5769203-50ef-4501-a3ac-ada2ddc6caaen%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/7d8ddafa-3465-4207-8762-b64f3b3a5de3n%40googlegroups.com.
