There must be a core dump or segmentation fault message in main Apache error 
logs if the daemon process is crashing.

One other remote possibility have seen with some third party libraries (C 
libraries, not so much Python wrappers), is that when they encounter an error, 
they call C library exit(), stopping the process. This results in no crash 
report.

What are the third party Python packages you are using?

Graham

> On 1 Sep 2020, at 7:58 am, David White <dswhit...@gmail.com> wrote:
> 
> >  another cause can be that you are using an external system to trigger log 
> > file rotation  
> 
> That is an inspired guess!  I am in fact using "cronolog" to mange logfile 
> rotation.  I don't think it's the culprit, since it rotates only monthly and 
> crashes can happen many times a day (sometimes minutes apart).  But removing 
> it can at least narrow the variables down.  
> 
> > requests that are getting stuck, and eventually you hit that socket timeout
> 
> It appears the crashes are pretty much immediate, within a second of a 
> request (or multiple requests) coming in.  The server may run 10 hours before 
> the next crash, or it may crash again within 5 minutes.  There does seem to 
> be some correlation between request load and crashing likelihood - crashes 
> almost always happen overnight when multiple cron jobs are hitting the 
> program at once.
> 
> Meanwhile I'll stuff "WSGIRestrictEmbedded On" into the main httpd.conf and 
> see what happens.
> 
> I appreciate the debugging advice.
> 
> On Monday, August 31, 2020 at 5:16:48 PM UTC-4 Graham Dumpleton wrote:
> 
>> On 1 Sep 2020, at 7:06 am, David White <dswh...@gmail.com 
>> <applewebdata://CA19DAF1-7E65-4B48-94F3-8BCB16BDF86D>> wrote:
>> 
>> Unfortunately the main Apache log shows nothing except normal 
>> startup/shutdown messages.  If the "sgm-prod" threads are being terminated 
>> and restarted, they are leaving no indication of that in the main server 
>> logs.  
>> 
>> The only clue appears to be that the crashing occurs during more heavy 
>> loads.  The application does not often strain the CPU/RAM of the underlying 
>> host, but the Apache vhost logs show the crashes seem to occur when multiple 
>> requests are being handled nearly simultaneously.  
>> 
>> [Mon Aug 31 11:11:07.504787 2020] [wsgi:error] [pid 35980:tid 
>> 140546546816768] [client 10.83.210.200:47266 <http://10.83.210.200:47266/>] 
>> Truncated or oversized response headers received from daemon process 
>> 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py
>> [Mon Aug 31 11:11:07.504806 2020] [wsgi:error] [pid 35980:tid 
>> 140546571994880] [client 10.31.82.142:35810 <http://10.31.82.142:35810/>] 
>> Truncated or oversized response headers received from daemon process 
>> 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py 
>> 
>> Am I likely to see any benefit to either:
>> Reducing the process or thread count passed to WSGIDaemonProcess, or
>> Putting Apache into "prefork" MPM?  
> 
> Never use prefork MPM if you can avoid it.
> 
> Also ensure that outside of VirtualHost you have:
> 
>    WSGIRestrictEmbedded On
> 
> to ensure you are never accidentally running in the main Apache child worker 
> processes.
> 
> If there is no evidence of a crash, then another cause can be that you are 
> using an external system to trigger log file rotation, rather than 
> recommended method of using Apache's own log file rotation method.
> 
> An external log file rotation system usually only fires once per day, but 
> maybe yours is set up differently based on size of logs.
> 
> The problem with an external log file rotation system is that it signals 
> Apache to restart. Although the main Apache child worker process are given a 
> grace period to finish requests, the way Apache libraries implement 
> management of third party processes such as the mod_wsgi daemon processes is 
> that it will kill them after 5 seconds if they don't shutdown quick enough. 
> This results in requests being proxied by Apache child worker processes being 
> chopped off and you see that message.
> 
> If it is this though, you should see clear messages at info level in logs 
> from mod_wsgi that the daemon processes were being shutdown and restarted. 
> This will appear some time before you see that message.
> 
> Only other thing can think of is that you have requests that are getting 
> stuck, and eventually you hit that socket timeout, although you have that set 
> very large, so would take a request to be stuck for 6000 seconds before you 
> saw it.
> 
> Only way to tell if may be that is to use:
> 
> https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces
>  
> <https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces>
> 
> so can signal mod_wsgi to dump Python stack traces periodically and see if a 
> specific request is stuck some where.
> 
> Graham
> 
> 
>> Thanks, Graham.
>> 
>> 
>> On Monday, August 31, 2020 at 4:56:46 PM UTC-4 Graham Dumpleton wrote:
>> Look in the main Apache error log, not the virtual host, and you will 
>> probably find a message about a segmentation fault or other error.
>> 
>> Where it is quite random like this, unless you can enable capture of the 
>> core dump file and can run a debugger (gdb) on that to get a stack trace, is 
>> going to be hard to track it down.
>> 
>> Only other thing can suggest is watching the process size to see if getting 
>> so large that you are running out of memory and a failure to allocate memory 
>> causes something to crash.
>> 
>> Graham
>> 
>> 
>>> On 1 Sep 2020, at 3:28 am, David White <dswh...@gmail.com <>> wrote:
>>> 
>> 
>>> 
>>> Hello.  I am running Apache 2.4.43 with mod_wsgi 4.71 compiled against 
>>> Python 3.8.3 (all manually compiled, not part of the RHEL 8.1 distro).  
>>> Apache MPM is "event".
>>> 
>>> The application running in one of my virtual hosts will occasionally crash, 
>>> but randomly.  Repeating the same request immediately will usually succeed. 
>>>  The application may continue working for hours before randomly crashing 
>>> again.
>>> 
>>> This started happening recently with an upgrade of the Flask and SQLAlchemy 
>>> modules.  (again, all manually installed via pip)
>>> 
>>> A crash is reported this way in the vhost's error_log:
>>> 
>>> [Mon Aug 31 11:11:07.504787 2020] [wsgi:error] [pid 35980:tid 
>>> 140546546816768] [client 10.83.210.200:47266 <http://10.83.210.200:47266/>] 
>>> Truncated or oversized response headers received from daemon process 
>>> 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py
>>> [Mon Aug 31 11:11:07.504806 2020] [wsgi:error] [pid 35980:tid 
>>> 140546571994880] [client 10.31.82.142:35810 <http://10.31.82.142:35810/>] 
>>> Truncated or oversized response headers received from daemon process 
>>> 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py
>>> [Mon Aug 31 11:11:08.512262 2020] [wsgi:info] [pid 40601:tid 
>>> 140547110185856] mod_wsgi (pid=40601): Attach interpreter ''.
>>> [Mon Aug 31 11:11:08.514415 2020] [wsgi:info] [pid 40601:tid 
>>> 140547110185856] mod_wsgi (pid=40601): Adding '/apps/www/sgm/wsgi/SGM' to 
>>> path.
>>> [Mon Aug 31 11:11:08.514526 2020] [wsgi:info] [pid 40601:tid 
>>> 140547110185856] mod_wsgi (pid=40601): Adding 
>>> '/apps/vmscan/.virtualenvs/sgm/lib/python3.8/site-packages' to path.
>>> [Mon Aug 31 11:11:08.515520 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546715096832] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 0 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515575 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546706704128] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 1 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515629 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546698311424] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 2 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515702 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546689918720] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 3 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515737 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546681526016] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 4 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515766 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546673133312] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 5 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515795 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546664740608] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 6 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515821 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546656347904] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 7 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515853 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546647955200] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 8 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515921 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546639562496] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 9 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515948 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546631169792] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 10 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.515972 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546622777088] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 11 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.516002 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546614384384] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 12 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.516036 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546605991680] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 13 in daemon process 'sgm-prod'.
>>> [Mon Aug 31 11:11:08.516061 2020] [wsgi:debug] [pid 40601:tid 
>>> 140546597598976] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>>> thread 14 in daemon process 'sgm-prod'.
>>> 
>>> The WSGI portion of the configuration for the vhost in Apache looks like 
>>> this:
>>> 
>>> WSGIPassAuthorization On
>>> LogLevel info wsgi:trace6
>>> SetEnv SGM_PRODUCTION  1
>>> SetEnv SGM_USE_ORACLE  1
>>> WSGIDaemonProcess sgm-prod user=sgmuser group=sgmgroup threads=15 
>>> python-home=/apps/sgmuser/.virtualenvs/sgm 
>>> python-path=/apps/www/sgm/wsgi/SGM:/apps/sgmuser/.virtualenvs/sgm/lib/python3.8/site-packages
>>>  socket-timeout=6000
>>> WSGIScriptAlias / /apps/www/sgm/wsgi/SGM/run.py
>>> <Directory /apps/www/sgm/wsgi/SGM>
>>>     Require all granted
>>>     AllowOverride AuthConfig
>>>     WSGIProcessGroup sgm-prod
>>>     WSGIApplicationGroup %{GLOBAL}
>>> </Directory>
>>> 
>>> The main Apache error log shows nothing relevant, and the 
>>> application-specific logs (with Python logging messages) show only that the 
>>> request was made.  There is no exception traceback data being logged (I'm 
>>> guessing the app is crashing before that can happen.)
>>> 
>>> Given how transitory this error is, I'm not sure how else to configure 
>>> Apache or the SGM app itself to get more details about why the "sgm-prod" 
>>> process seems to be crashing.
>>> 
>>> Any help would be appreciated!  Thanks.
>>> 
>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to modwsgi+u...@googlegroups.com <>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/modwsgi/8df00cf3-abeb-43a8-b14a-6666af322a9an%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/modwsgi/8df00cf3-abeb-43a8-b14a-6666af322a9an%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to modwsgi+u...@googlegroups.com 
>> <applewebdata://CA19DAF1-7E65-4B48-94F3-8BCB16BDF86D>.
> 
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/modwsgi/c5769203-50ef-4501-a3ac-ada2ddc6caaen%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/modwsgi/c5769203-50ef-4501-a3ac-ada2ddc6caaen%40googlegroups.com?utm_medium=email&utm_source=footer>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to modwsgi+unsubscr...@googlegroups.com 
> <mailto:modwsgi+unsubscr...@googlegroups.com>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/modwsgi/7d8ddafa-3465-4207-8762-b64f3b3a5de3n%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/modwsgi/7d8ddafa-3465-4207-8762-b64f3b3a5de3n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to modwsgi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/modwsgi/C9A4CBEE-9B07-4D41-A938-A948CA39EDDD%40gmail.com.

Reply via email to