*>  another cause can be that you are using an external system to trigger 
log file rotation * 

That is an inspired guess!  I am in fact using "cronolog" to mange logfile 
rotation.  I don't *think* it's the culprit, since it rotates only monthly 
and crashes can happen many times a day (sometimes minutes apart).  But 
removing it can at least narrow the variables down.  

> requests that are getting stuck, and eventually you hit that socket 
timeout

It appears the crashes are pretty much immediate, within a second of a 
request (or multiple requests) coming in.  The server may run 10 hours 
before the next crash, or it may crash again within 5 minutes.  There does 
seem to be some correlation between request load and crashing likelihood - 
crashes almost always happen overnight when multiple cron jobs are hitting 
the program at once.

Meanwhile I'll stuff "WSGIRestrictEmbedded On" into the main httpd.conf and 
see what happens.

I appreciate the debugging advice.

On Monday, August 31, 2020 at 5:16:48 PM UTC-4 Graham Dumpleton wrote:

>
> On 1 Sep 2020, at 7:06 am, David White <[email protected]> wrote:
>
> Unfortunately the main Apache log shows nothing except normal 
> startup/shutdown messages.  If the "sgm-prod" threads are being terminated 
> and restarted, they are leaving no indication of that in the main server 
> logs.  
>
> The only clue appears to be that the crashing occurs during more heavy 
> loads.  The application does not often strain the CPU/RAM of the underlying 
> host, but the Apache vhost logs show the crashes seem to occur when 
> multiple requests are being handled nearly simultaneously.  
>
> [Mon Aug 31 11:11:07.504787 2020] [wsgi:error] [pid 35980:tid 
> 140546546816768] [client 10.83.210.200:47266] Truncated or oversized 
> response headers received from daemon process 'sgm-prod': 
> /apps/www/sgm/wsgi/SGM/run.py
> [Mon Aug 31 11:11:07.504806 2020] [wsgi:error] [pid 35980:tid 
> 140546571994880] [client 10.31.82.142:35810] Truncated or oversized 
> response headers received from daemon process 'sgm-prod': 
> /apps/www/sgm/wsgi/SGM/run.py 
>
> Am I likely to see any benefit to either:
>
>    1. Reducing the process or thread count passed to WSGIDaemonProcess, or
>    2. Putting Apache into "prefork" MPM?  
>    
> Never use prefork MPM if you can avoid it.
>
> Also ensure that outside of VirtualHost you have:
>
>    WSGIRestrictEmbedded On
>
> to ensure you are never accidentally running in the main Apache child 
> worker processes.
>
> If there is no evidence of a crash, then another cause can be that you are 
> using an external system to trigger log file rotation, rather than 
> recommended method of using Apache's own log file rotation method.
>
> An external log file rotation system usually only fires once per day, but 
> maybe yours is set up differently based on size of logs.
>
> The problem with an external log file rotation system is that it signals 
> Apache to restart. Although the main Apache child worker process are given 
> a grace period to finish requests, the way Apache libraries implement 
> management of third party processes such as the mod_wsgi daemon processes 
> is that it will kill them after 5 seconds if they don't shutdown quick 
> enough. This results in requests being proxied by Apache child worker 
> processes being chopped off and you see that message.
>
> If it is this though, you should see clear messages at info level in logs 
> from mod_wsgi that the daemon processes were being shutdown and restarted. 
> This will appear some time before you see that message.
>
> Only other thing can think of is that you have requests that are getting 
> stuck, and eventually you hit that socket timeout, although you have that 
> set very large, so would take a request to be stuck for 6000 seconds before 
> you saw it.
>
> Only way to tell if may be that is to use:
>
>
> https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces
>
> so can signal mod_wsgi to dump Python stack traces periodically and see if 
> a specific request is stuck some where.
>
> Graham
>
> Thanks, Graham.
>
> On Monday, August 31, 2020 at 4:56:46 PM UTC-4 Graham Dumpleton wrote:
>
>> Look in the main Apache error log, not the virtual host, and you will 
>> probably find a message about a segmentation fault or other error.
>>
>> Where it is quite random like this, unless you can enable capture of the 
>> core dump file and can run a debugger (gdb) on that to get a stack trace, 
>> is going to be hard to track it down.
>>
>> Only other thing can suggest is watching the process size to see if 
>> getting so large that you are running out of memory and a failure to 
>> allocate memory causes something to crash.
>>
>> Graham
>>
>> On 1 Sep 2020, at 3:28 am, David White <[email protected]> wrote:
>>
>>
>> Hello.  I am running Apache 2.4.43 with mod_wsgi 4.71 compiled against 
>> Python 3.8.3 (all manually compiled, not part of the RHEL 8.1 distro).  
>> Apache MPM is "event".
>>
>> The application running in one of my virtual hosts will occasionally 
>> crash, but randomly.  Repeating the same request immediately will usually 
>> succeed.  The application may continue working for hours before randomly 
>> crashing again.
>>
>> This started happening recently with an upgrade of the Flask and 
>> SQLAlchemy modules.  (again, all manually installed via pip)
>>
>> A crash is reported this way in the vhost's error_log:
>>
>> [Mon Aug 31 11:11:07.504787 2020] [wsgi:error] [pid 35980:tid 
>> 140546546816768] [client 10.83.210.200:47266] Truncated or oversized 
>> response headers received from daemon process 'sgm-prod': 
>> /apps/www/sgm/wsgi/SGM/run.py
>> [Mon Aug 31 11:11:07.504806 2020] [wsgi:error] [pid 35980:tid 
>> 140546571994880] [client 10.31.82.142:35810] Truncated or oversized 
>> response headers received from daemon process 'sgm-prod': 
>> /apps/www/sgm/wsgi/SGM/run.py
>> [Mon Aug 31 11:11:08.512262 2020] [wsgi:info] [pid 40601:tid 
>> 140547110185856] mod_wsgi (pid=40601): Attach interpreter ''.
>> [Mon Aug 31 11:11:08.514415 2020] [wsgi:info] [pid 40601:tid 
>> 140547110185856] mod_wsgi (pid=40601): Adding '/apps/www/sgm/wsgi/SGM' to 
>> path.
>> [Mon Aug 31 11:11:08.514526 2020] [wsgi:info] [pid 40601:tid 
>> 140547110185856] mod_wsgi (pid=40601): Adding 
>> '/apps/vmscan/.virtualenvs/sgm/lib/python3.8/site-packages' to path.
>> [Mon Aug 31 11:11:08.515520 2020] [wsgi:debug] [pid 40601:tid 
>> 140546715096832] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 0 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515575 2020] [wsgi:debug] [pid 40601:tid 
>> 140546706704128] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 1 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515629 2020] [wsgi:debug] [pid 40601:tid 
>> 140546698311424] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 2 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515702 2020] [wsgi:debug] [pid 40601:tid 
>> 140546689918720] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 3 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515737 2020] [wsgi:debug] [pid 40601:tid 
>> 140546681526016] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 4 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515766 2020] [wsgi:debug] [pid 40601:tid 
>> 140546673133312] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 5 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515795 2020] [wsgi:debug] [pid 40601:tid 
>> 140546664740608] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 6 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515821 2020] [wsgi:debug] [pid 40601:tid 
>> 140546656347904] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 7 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515853 2020] [wsgi:debug] [pid 40601:tid 
>> 140546647955200] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 8 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515921 2020] [wsgi:debug] [pid 40601:tid 
>> 140546639562496] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 9 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515948 2020] [wsgi:debug] [pid 40601:tid 
>> 140546631169792] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 10 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.515972 2020] [wsgi:debug] [pid 40601:tid 
>> 140546622777088] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 11 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.516002 2020] [wsgi:debug] [pid 40601:tid 
>> 140546614384384] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 12 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.516036 2020] [wsgi:debug] [pid 40601:tid 
>> 140546605991680] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 13 in daemon process 'sgm-prod'.
>> [Mon Aug 31 11:11:08.516061 2020] [wsgi:debug] [pid 40601:tid 
>> 140546597598976] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): Started 
>> thread 14 in daemon process 'sgm-prod'.
>>
>> The WSGI portion of the configuration for the vhost in Apache looks like 
>> this:
>>
>> WSGIPassAuthorization On
>> LogLevel info wsgi:trace6
>> SetEnv SGM_PRODUCTION  1
>> SetEnv SGM_USE_ORACLE  1
>> WSGIDaemonProcess sgm-prod user=sgmuser group=sgmgroup threads=15 
>> python-home=/apps/sgmuser/.virtualenvs/sgm 
>> python-path=/apps/www/sgm/wsgi/SGM:/apps/sgmuser/.virtualenvs/sgm/lib/python3.8/site-packages
>>  
>> socket-timeout=6000
>> WSGIScriptAlias / /apps/www/sgm/wsgi/SGM/run.py
>> <Directory /apps/www/sgm/wsgi/SGM>
>>     Require all granted
>>     AllowOverride AuthConfig
>>     WSGIProcessGroup sgm-prod
>>     WSGIApplicationGroup %{GLOBAL}
>> </Directory>
>>
>> The main Apache error log shows nothing relevant, and the 
>> application-specific logs (with Python logging messages) show only that the 
>> request was made.  There is no exception traceback data being logged (I'm 
>> guessing the app is crashing before that can happen.)
>>
>> Given how transitory this error is, I'm not sure how else to configure 
>> Apache or the SGM app itself to get more details about why the "sgm-prod" 
>> process seems to be crashing.
>>
>> Any help would be appreciated!  Thanks.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/modwsgi/8df00cf3-abeb-43a8-b14a-6666af322a9an%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/modwsgi/8df00cf3-abeb-43a8-b14a-6666af322a9an%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
>
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/modwsgi/c5769203-50ef-4501-a3ac-ada2ddc6caaen%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/modwsgi/c5769203-50ef-4501-a3ac-ada2ddc6caaen%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/modwsgi/7d8ddafa-3465-4207-8762-b64f3b3a5de3n%40googlegroups.com.

Reply via email to