Re: [modwsgi] Inconsistent crashes with "Truncated or oversized response headers"

Graham Dumpleton Mon, 31 Aug 2020 20:48:52 -0700

And another possibility is that your Linux OOM killer is killing the processes 
because they are using up too much memory.


https://docs.memset.com/other/linux-s-oom-process-killer 
<https://docs.memset.com/other/linux-s-oom-process-killer>

Apache is quite susceptible to being killed by it because of how it uses 
resources.

Check the logs it describes to see if is being triggered.

> On 1 Sep 2020, at 9:18 am, Graham Dumpleton <[email protected]> 
> wrote:
> 
> There must be a core dump or segmentation fault message in main Apache error 
> logs if the daemon process is crashing.
> 
> One other remote possibility have seen with some third party libraries (C 
> libraries, not so much Python wrappers), is that when they encounter an 
> error, they call C library exit(), stopping the process. This results in no 
> crash report.
> 
> What are the third party Python packages you are using?
> 
> Graham
> 
>> On 1 Sep 2020, at 7:58 am, David White <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> >  another cause can be that you are using an external system to trigger log 
>> > file rotation  
>> 
>> That is an inspired guess!  I am in fact using "cronolog" to mange logfile 
>> rotation.  I don't think it's the culprit, since it rotates only monthly and 
>> crashes can happen many times a day (sometimes minutes apart).  But removing 
>> it can at least narrow the variables down.  
>> 
>> > requests that are getting stuck, and eventually you hit that socket timeout
>> 
>> It appears the crashes are pretty much immediate, within a second of a 
>> request (or multiple requests) coming in.  The server may run 10 hours 
>> before the next crash, or it may crash again within 5 minutes.  There does 
>> seem to be some correlation between request load and crashing likelihood - 
>> crashes almost always happen overnight when multiple cron jobs are hitting 
>> the program at once.
>> 
>> Meanwhile I'll stuff "WSGIRestrictEmbedded On" into the main httpd.conf and 
>> see what happens.
>> 
>> I appreciate the debugging advice.
>> 
>> On Monday, August 31, 2020 at 5:16:48 PM UTC-4 Graham Dumpleton wrote:
>> 
>>> On 1 Sep 2020, at 7:06 am, David White <[email protected] 
>>> <applewebdata://043D785A-F81C-4CDE-A2F7-985F41E8EB36>> wrote:
>>> 
>>> Unfortunately the main Apache log shows nothing except normal 
>>> startup/shutdown messages.  If the "sgm-prod" threads are being terminated 
>>> and restarted, they are leaving no indication of that in the main server 
>>> logs.  
>>> 
>>> The only clue appears to be that the crashing occurs during more heavy 
>>> loads.  The application does not often strain the CPU/RAM of the underlying 
>>> host, but the Apache vhost logs show the crashes seem to occur when 
>>> multiple requests are being handled nearly simultaneously.  
>>> 
>>> [Mon Aug 31 11:11:07.504787 2020] [wsgi:error] [pid 35980:tid 
>>> 140546546816768] [client 10.83.210.200:47266 <http://10.83.210.200:47266/>] 
>>> Truncated or oversized response headers received from daemon process 
>>> 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py
>>> [Mon Aug 31 11:11:07.504806 2020] [wsgi:error] [pid 35980:tid 
>>> 140546571994880] [client 10.31.82.142:35810 <http://10.31.82.142:35810/>] 
>>> Truncated or oversized response headers received from daemon process 
>>> 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py 
>>> 
>>> Am I likely to see any benefit to either:
>>> Reducing the process or thread count passed to WSGIDaemonProcess, or
>>> Putting Apache into "prefork" MPM?  
>> 
>> Never use prefork MPM if you can avoid it.
>> 
>> Also ensure that outside of VirtualHost you have:
>> 
>>    WSGIRestrictEmbedded On
>> 
>> to ensure you are never accidentally running in the main Apache child worker 
>> processes.
>> 
>> If there is no evidence of a crash, then another cause can be that you are 
>> using an external system to trigger log file rotation, rather than 
>> recommended method of using Apache's own log file rotation method.
>> 
>> An external log file rotation system usually only fires once per day, but 
>> maybe yours is set up differently based on size of logs.
>> 
>> The problem with an external log file rotation system is that it signals 
>> Apache to restart. Although the main Apache child worker process are given a 
>> grace period to finish requests, the way Apache libraries implement 
>> management of third party processes such as the mod_wsgi daemon processes is 
>> that it will kill them after 5 seconds if they don't shutdown quick enough. 
>> This results in requests being proxied by Apache child worker processes 
>> being chopped off and you see that message.
>> 
>> If it is this though, you should see clear messages at info level in logs 
>> from mod_wsgi that the daemon processes were being shutdown and restarted. 
>> This will appear some time before you see that message.
>> 
>> Only other thing can think of is that you have requests that are getting 
>> stuck, and eventually you hit that socket timeout, although you have that 
>> set very large, so would take a request to be stuck for 6000 seconds before 
>> you saw it.
>> 
>> Only way to tell if may be that is to use:
>> 
>> https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces
>>  
>> <https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces>
>> 
>> so can signal mod_wsgi to dump Python stack traces periodically and see if a 
>> specific request is stuck some where.
>> 
>> Graham
>> 
>> 
>>> Thanks, Graham.
>>> 
>>> 
>>> On Monday, August 31, 2020 at 4:56:46 PM UTC-4 Graham Dumpleton wrote:
>>> Look in the main Apache error log, not the virtual host, and you will 
>>> probably find a message about a segmentation fault or other error.
>>> 
>>> Where it is quite random like this, unless you can enable capture of the 
>>> core dump file and can run a debugger (gdb) on that to get a stack trace, 
>>> is going to be hard to track it down.
>>> 
>>> Only other thing can suggest is watching the process size to see if getting 
>>> so large that you are running out of memory and a failure to allocate 
>>> memory causes something to crash.
>>> 
>>> Graham
>>> 
>>> 
>>>> On 1 Sep 2020, at 3:28 am, David White <[email protected] <>> wrote:
>>>> 
>>> 
>>>> 
>>>> Hello.  I am running Apache 2.4.43 with mod_wsgi 4.71 compiled against 
>>>> Python 3.8.3 (all manually compiled, not part of the RHEL 8.1 distro).  
>>>> Apache MPM is "event".
>>>> 
>>>> The application running in one of my virtual hosts will occasionally 
>>>> crash, but randomly.  Repeating the same request immediately will usually 
>>>> succeed.  The application may continue working for hours before randomly 
>>>> crashing again.
>>>> 
>>>> This started happening recently with an upgrade of the Flask and 
>>>> SQLAlchemy modules.  (again, all manually installed via pip)
>>>> 
>>>> A crash is reported this way in the vhost's error_log:
>>>> 
>>>> [Mon Aug 31 11:11:07.504787 2020] [wsgi:error] [pid 35980:tid 
>>>> 140546546816768] [client 10.83.210.200:47266 
>>>> <http://10.83.210.200:47266/>] Truncated or oversized response headers 
>>>> received from daemon process 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py
>>>> [Mon Aug 31 11:11:07.504806 2020] [wsgi:error] [pid 35980:tid 
>>>> 140546571994880] [client 10.31.82.142:35810 <http://10.31.82.142:35810/>] 
>>>> Truncated or oversized response headers received from daemon process 
>>>> 'sgm-prod': /apps/www/sgm/wsgi/SGM/run.py
>>>> [Mon Aug 31 11:11:08.512262 2020] [wsgi:info] [pid 40601:tid 
>>>> 140547110185856] mod_wsgi (pid=40601): Attach interpreter ''.
>>>> [Mon Aug 31 11:11:08.514415 2020] [wsgi:info] [pid 40601:tid 
>>>> 140547110185856] mod_wsgi (pid=40601): Adding '/apps/www/sgm/wsgi/SGM' to 
>>>> path.
>>>> [Mon Aug 31 11:11:08.514526 2020] [wsgi:info] [pid 40601:tid 
>>>> 140547110185856] mod_wsgi (pid=40601): Adding 
>>>> '/apps/vmscan/.virtualenvs/sgm/lib/python3.8/site-packages' to path.
>>>> [Mon Aug 31 11:11:08.515520 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546715096832] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 0 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515575 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546706704128] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 1 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515629 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546698311424] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 2 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515702 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546689918720] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 3 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515737 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546681526016] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 4 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515766 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546673133312] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 5 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515795 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546664740608] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 6 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515821 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546656347904] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 7 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515853 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546647955200] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 8 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515921 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546639562496] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 9 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515948 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546631169792] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 10 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.515972 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546622777088] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 11 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.516002 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546614384384] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 12 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.516036 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546605991680] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 13 in daemon process 'sgm-prod'.
>>>> [Mon Aug 31 11:11:08.516061 2020] [wsgi:debug] [pid 40601:tid 
>>>> 140546597598976] src/server/mod_wsgi.c(9118): mod_wsgi (pid=40601): 
>>>> Started thread 14 in daemon process 'sgm-prod'.
>>>> 
>>>> The WSGI portion of the configuration for the vhost in Apache looks like 
>>>> this:
>>>> 
>>>> WSGIPassAuthorization On
>>>> LogLevel info wsgi:trace6
>>>> SetEnv SGM_PRODUCTION  1
>>>> SetEnv SGM_USE_ORACLE  1
>>>> WSGIDaemonProcess sgm-prod user=sgmuser group=sgmgroup threads=15 
>>>> python-home=/apps/sgmuser/.virtualenvs/sgm 
>>>> python-path=/apps/www/sgm/wsgi/SGM:/apps/sgmuser/.virtualenvs/sgm/lib/python3.8/site-packages
>>>>  socket-timeout=6000
>>>> WSGIScriptAlias / /apps/www/sgm/wsgi/SGM/run.py
>>>> <Directory /apps/www/sgm/wsgi/SGM>
>>>>     Require all granted
>>>>     AllowOverride AuthConfig
>>>>     WSGIProcessGroup sgm-prod
>>>>     WSGIApplicationGroup %{GLOBAL}
>>>> </Directory>
>>>> 
>>>> The main Apache error log shows nothing relevant, and the 
>>>> application-specific logs (with Python logging messages) show only that 
>>>> the request was made.  There is no exception traceback data being logged 
>>>> (I'm guessing the app is crashing before that can happen.)
>>>> 
>>>> Given how transitory this error is, I'm not sure how else to configure 
>>>> Apache or the SGM app itself to get more details about why the "sgm-prod" 
>>>> process seems to be crashing.
>>>> 
>>>> Any help would be appreciated!  Thanks.
>>>> 
>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "modwsgi" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected] <>.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/modwsgi/8df00cf3-abeb-43a8-b14a-6666af322a9an%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/modwsgi/8df00cf3-abeb-43a8-b14a-6666af322a9an%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected] 
>>> <applewebdata://043D785A-F81C-4CDE-A2F7-985F41E8EB36>.
>> 
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/modwsgi/c5769203-50ef-4501-a3ac-ada2ddc6caaen%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/modwsgi/c5769203-50ef-4501-a3ac-ada2ddc6caaen%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/modwsgi/7d8ddafa-3465-4207-8762-b64f3b3a5de3n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/modwsgi/7d8ddafa-3465-4207-8762-b64f3b3a5de3n%40googlegroups.com?utm_medium=email&utm_source=footer>.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/modwsgi/15A2051C-A443-42EC-BBC0-B2F61AAC5D89%40gmail.com.

Re: [modwsgi] Inconsistent crashes with "Truncated or oversized response headers"

Reply via email to