And another possibility is that your Linux OOM killer is killing the processes because they are using up too much memory.
https://docs.memset.com/other/linux-s-oom-process-killer <https://docs.memset.com/other/linux-s-oom-process-killer> Apache is quite susceptible to being killed by it because of how it uses resources. Check the logs it describes to see if is being triggered. > On 1 Sep 2020, at 9:18 am, Graham Dumpleton <[email protected]> > wrote: > > There must be a core dump or segmentation fault message in main Apache error > logs if the daemon process is crashing. > > One other remote possibility have seen with some third party libraries (C > libraries, not so much Python wrappers), is that when they encounter an > error, they call C library exit(), stopping the process. This results in no > crash report. > > What are the third party Python packages you are using? > > Graham > >> On 1 Sep 2020, at 7:58 am, David White <[email protected] >> <mailto:[email protected]>> wrote: >> >> > another cause can be that you are using an external system to trigger log >> > file rotation >> >> That is an inspired guess! I am in fact using "cronolog" to mange logfile >> rotation. I don't think it's the culprit, since it rotates only monthly and >> crashes can happen many times a day (sometimes minutes apart). But removing >> it can at least narrow the variables down. >> >> > requests that are getting stuck, and eventually you hit that socket timeout >> >> It appears the crashes are pretty much immediate, within a second of a >> request (or multiple requests) coming in. The server may run 10 hours >> before the next crash, or it may crash again within 5 minutes. There does >> seem to be some correlation between request load and crashing likelihood - >> crashes almost always happen overnight when multiple cron jobs are hitting >> the program at once. >> >> Meanwhile I'll stuff "WSGIRestrictEmbedded On" into the main httpd.conf and >> see what happens. >> >> I appreciate the debugging advice. >> >> On Monday, August 31, 2020 at 5:16:48 PM UTC-4 Graham Dumpleton wrote: >> >>> On 1 Sep 2020, at 7:06 am, David White <[email protected] >>> <applewebdata://043D785A-F81C-4CDE-A2F7-985F41E8EB36>> wrote: >>> >>> Unfortunately the main Apache log shows nothing except normal >>> startup/shutdown messages. If the "sgm-prod" threads are being terminated >>> and restarted, they are leaving no indication of that in the main server >>> logs. >>> >>> The only clue appears to be that the crashing occurs during more heavy >>> loads. The application does not often strain the CPU/RAM of the underlying >>> host, but the Apache vhost logs show the crashes seem to occur when >>> multiple requests are being handled nearly simultaneously. >>> >>> [Mon Aug 31 11:11:07.504787 2020] [wsgi:error] [pid 35980:tid >>> 140546546816768] [client 10.83.210.200:47266 <http://10.83.210.200:47266/>] >>> Truncated or oversized response headers received from daemon process >>> 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py >>> [Mon Aug 31 11:11:07.504806 2020] [wsgi:error] [pid 35980:tid >>> 140546571994880] [client 10.31.82.142:35810 <http://10.31.82.142:35810/>] >>> Truncated or oversized response headers received from daemon process >>> 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py >>> >>> Am I likely to see any benefit to either: >>> Reducing the process or thread count passed to WSGIDaemonProcess, or >>> Putting Apache into "prefork" MPM? >> >> Never use prefork MPM if you can avoid it. >> >> Also ensure that outside of VirtualHost you have: >> >> WSGIRestrictEmbedded On >> >> to ensure you are never accidentally running in the main Apache child worker >> processes. >> >> If there is no evidence of a crash, then another cause can be that you are >> using an external system to trigger log file rotation, rather than >> recommended method of using Apache's own log file rotation method. >> >> An external log file rotation system usually only fires once per day, but >> maybe yours is set up differently based on size of logs. >> >> The problem with an external log file rotation system is that it signals >> Apache to restart. Although the main Apache child worker process are given a >> grace period to finish requests, the way Apache libraries implement >> management of third party processes such as the mod_wsgi daemon processes is >> that it will kill them after 5 seconds if they don't shutdown quick enough. >> This results in requests being proxied by Apache child worker processes >> being chopped off and you see that message. >> >> If it is this though, you should see clear messages at info level in logs >> from mod_wsgi that the daemon processes were being shutdown and restarted. >> This will appear some time before you see that message. >> >> Only other thing can think of is that you have requests that are getting >> stuck, and eventually you hit that socket timeout, although you have that >> set very large, so would take a request to be stuck for 6000 seconds before >> you saw it. >> >> Only way to tell if may be that is to use: >> >> https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces >> >> <https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces> >> >> so can signal mod_wsgi to dump Python stack traces periodically and see if a >> specific request is stuck some where. >> >> Graham >> >> >>> Thanks, Graham. >>> >>> >>> On Monday, August 31, 2020 at 4:56:46 PM UTC-4 Graham Dumpleton wrote: >>> Look in the main Apache error log, not the virtual host, and you will >>> probably find a message about a segmentation fault or other error. >>> >>> Where it is quite random like this, unless you can enable capture of the >>> core dump file and can run a debugger (gdb) on that to get a stack trace, >>> is going to be hard to track it down. >>> >>> Only other thing can suggest is watching the process size to see if getting >>> so large that you are running out of memory and a failure to allocate >>> memory causes something to crash. >>> >>> Graham >>> >>> >>>> On 1 Sep 2020, at 3:28 am, David White <[email protected] <>> wrote: >>>> >>> >>>> >>>> Hello. I am running Apache 2.4.43 with mod_wsgi 4.71 compiled against >>>> Python 3.8.3 (all manually compiled, not part of the RHEL 8.1 distro). >>>> Apache MPM is "event". >>>> >>>> The application running in one of my virtual hosts will occasionally >>>> crash, but randomly. Repeating the same request immediately will usually >>>> succeed. The application may continue working for hours before randomly >>>> crashing again. >>>> >>>> This started happening recently with an upgrade of the Flask and >>>> SQLAlchemy modules. (again, all manually installed via pip) >>>> >>>> A crash is reported this way in the vhost's error_log: >>>> >>>> [Mon Aug 31 11:11:07.504787 2020] [wsgi:error] [pid 35980:tid >>>> 140546546816768] [client 10.83.210.200:47266 >>>> <http://10.83.210.200:47266/>] Truncated or oversized response headers >>>> received from daemon process 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py >>>> [Mon Aug 31 11:11:07.504806 2020] [wsgi:error] [pid 35980:tid >>>> 140546571994880] [client 10.31.82.142:35810 <http://10.31.82.142:35810/>] >>>> Truncated or oversized response headers received from daemon process >>>> 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py >>>> [Mon Aug 31 11:11:08.512262 2020] [wsgi:info] [pid 40601:tid >>>> 140547110185856] mod_wsgi (pid=40601): Attach interpreter ''. >>>> [Mon Aug 31 11:11:08.514415 2020] [wsgi:info] [pid 40601:tid >>>> 140547110185856] mod_wsgi (pid=40601): Adding '/apps/www/sgm/wsgi/SGM' to >>>> path. >>>> [Mon Aug 31 11:11:08.514526 2020] [wsgi:info] [pid 40601:tid >>>> 140547110185856] mod_wsgi (pid=40601): Adding >>>> '/apps/vmscan/.virtualenvs/sgm/lib/python3.8/site-packages' to path. >>>> [Mon Aug 31 11:11:08.515520 2020] [wsgi:debug] [pid 40601:tid >>>> 140546715096832] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 0 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515575 2020] [wsgi:debug] [pid 40601:tid >>>> 140546706704128] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 1 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515629 2020] [wsgi:debug] [pid 40601:tid >>>> 140546698311424] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 2 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515702 2020] [wsgi:debug] [pid 40601:tid >>>> 140546689918720] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 3 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515737 2020] [wsgi:debug] [pid 40601:tid >>>> 140546681526016] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 4 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515766 2020] [wsgi:debug] [pid 40601:tid >>>> 140546673133312] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 5 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515795 2020] [wsgi:debug] [pid 40601:tid >>>> 140546664740608] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 6 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515821 2020] [wsgi:debug] [pid 40601:tid >>>> 140546656347904] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 7 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515853 2020] [wsgi:debug] [pid 40601:tid >>>> 140546647955200] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 8 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515921 2020] [wsgi:debug] [pid 40601:tid >>>> 140546639562496] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 9 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515948 2020] [wsgi:debug] [pid 40601:tid >>>> 140546631169792] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 10 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.515972 2020] [wsgi:debug] [pid 40601:tid >>>> 140546622777088] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 11 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.516002 2020] [wsgi:debug] [pid 40601:tid >>>> 140546614384384] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 12 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.516036 2020] [wsgi:debug] [pid 40601:tid >>>> 140546605991680] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 13 in daemon process 'sgm-prod'. >>>> [Mon Aug 31 11:11:08.516061 2020] [wsgi:debug] [pid 40601:tid >>>> 140546597598976] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): >>>> Started thread 14 in daemon process 'sgm-prod'. >>>> >>>> The WSGI portion of the configuration for the vhost in Apache looks like >>>> this: >>>> >>>> WSGIPassAuthorization On >>>> LogLevel info wsgi:trace6 >>>> SetEnv SGM_PRODUCTION 1 >>>> SetEnv SGM_USE_ORACLE 1 >>>> WSGIDaemonProcess sgm-prod user=sgmuser group=sgmgroup threads=15 >>>> python-home=/apps/sgmuser/.virtualenvs/sgm >>>> python-path=/apps/www/sgm/wsgi/SGM:/apps/sgmuser/.virtualenvs/sgm/lib/python3.8/site-packages >>>> socket-timeout=6000 >>>> WSGIScriptAlias / /apps/www/sgm/wsgi/SGM/run.py >>>> <Directory /apps/www/sgm/wsgi/SGM> >>>> Require all granted >>>> AllowOverride AuthConfig >>>> WSGIProcessGroup sgm-prod >>>> WSGIApplicationGroup %{GLOBAL} >>>> </Directory> >>>> >>>> The main Apache error log shows nothing relevant, and the >>>> application-specific logs (with Python logging messages) show only that >>>> the request was made. There is no exception traceback data being logged >>>> (I'm guessing the app is crashing before that can happen.) >>>> >>>> Given how transitory this error is, I'm not sure how else to configure >>>> Apache or the SGM app itself to get more details about why the "sgm-prod" >>>> process seems to be crashing. >>>> >>>> Any help would be appreciated! Thanks. >>>> >>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "modwsgi" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to [email protected] <>. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/modwsgi/8df00cf3-abeb-43a8-b14a-6666af322a9an%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/modwsgi/8df00cf3-abeb-43a8-b14a-6666af322a9an%40googlegroups.com?utm_medium=email&utm_source=footer>. >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "modwsgi" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected] >>> <applewebdata://043D785A-F81C-4CDE-A2F7-985F41E8EB36>. >> >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/modwsgi/c5769203-50ef-4501-a3ac-ada2ddc6caaen%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/modwsgi/c5769203-50ef-4501-a3ac-ada2ddc6caaen%40googlegroups.com?utm_medium=email&utm_source=footer>. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] >> <mailto:[email protected]>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/modwsgi/7d8ddafa-3465-4207-8762-b64f3b3a5de3n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/modwsgi/7d8ddafa-3465-4207-8762-b64f3b3a5de3n%40googlegroups.com?utm_medium=email&utm_source=footer>. > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/15A2051C-A443-42EC-BBC0-B2F61AAC5D89%40gmail.com.
