Re: [modwsgi] mod_wsgi returning 503 service unavailable

Kelvin Wong Wed, 29 Oct 2014 19:31:12 -0700

Here is a part of main apache error log when the 503 happened.
[Wed Oct 29 12:56:26.727197 2014] [mpm_worker:error] [pid 1322:tid 
139958218430336] AH00287: server is within MinSpareThreads of 
MaxRequestWorkers, consider raising the MaxRequestWorkers setting
[Wed Oct 29 12:56:30.730902 2014] [mpm_worker:error] [pid 1322:tid 
139958218430336] AH00286: server reached MaxRequestWorkers setting, 
consider raising the MaxRequestWorkers setting


What does it means? It can be solved by raising "MaxRequestWorkers"?

Thanks.


On Thursday, October 30, 2014 10:12:19 AM UTC+8, Kelvin Wong wrote:
>
> Hi Graham
>
> I upgraded mod_wsgi to 4.3.0. The 503 situation happened again.
> The error message is same as before. But this time, there is no more "
> Timeout when reading response headers from daemon process" before the "(11
> )Resource temporarily unavailable".
> Is there anything I can do to prevent this kind of situation?
> Or there is any ways to make the apache self-heal?
>
> Thanks.
>
>
> On Wednesday, October 29, 2014 10:36:13 AM UTC+8, Graham Dumpleton wrote:
>>
>>
>> On 28/10/2014, at 10:58 PM, Kelvin Wong <[email protected]> wrote:
>>
>> do you actually have figures on what the memory usage of the Apache child 
>> worker processes grows to?
>>
>> I do. I used New Relic to monitor the system resource usage. I found as 
>> time goes, the apache processes take a lot of memory. That's why I want to 
>> control the memory usage of the apache.
>>
>>
>> Okay, but where in New Relic are you monitoring this? I am concerned now 
>> as to whether you are even looking at just the Apache child worker 
>> processes that MaxConnectionsPerChild pertains to.
>>
>> If you were looking at the host breakout chart on the overview dashboard 
>> for the WSGI application being monitoring by the Python web application 
>> agent, and you are using daemon mode, then what you are looking at is the 
>> memory taken by the mod_wsgi daemon processes and not the Apache child 
>> worker processes. As a consequence the MaxConnectionsPerChild directive 
>> doesn't apply.
>>
>> If you were looking at the server monitoring charts and looking at the 
>> Apache httpd/apache2 process, then that is all processes under Apache, 
>> which counts both the Apache child worker processes and the mod_wsgi daemon 
>> processes. If you relied on those charts, you can't tell whether it is the 
>> Apache child processes or mod_wsgi daemon processes.
>>
>> So you can from the Python web application agent or the server monitoring 
>> agent tell how much memory is just being used by the Apache child worker 
>> processes.
>>
>> In the chart I included which can still see below, that is relying on a 
>> platform plugin agent for Apache/mod_wsgi. Unlike the others, it does pull 
>> out memory just for the Apache child worker processes. I then created a 
>> custom dashboard which includes charts for metrics from both the Python web 
>> application agent and the Apache/mod_wsgi platform plugin so can cross 
>> compare them. That is how I got all the charts I showed.
>>
>> So right now I am question whether you should be 
>> using MaxConnectionsPerChild as it is more likely that you may be looking 
>> at the size of the mod_wsgi daemon processes which actually contain your 
>> WSGI application.
>>
>> Also, my application is mainly apis for mobile application which involved 
>> uploading files/images. I found that there are a lot of IOError occurred as 
>> seemed the upload is unexpected terminated by the mobile application.
>> Do you have any suggestions on this?
>>
>>
>> You can't stop connections being dropped, especially with mobile agents.
>>
>> What size are the images?
>>
>> One thing you can do and which is actually a good idea overall 
>> independent of your specific situation, is to place a nginx front end proxy 
>> in front of Apache/mod_wsgi. The preferable way of configuring nginx is to 
>> have it use HTTP/1.1 and keep alive connections to Apache. You have to be 
>> on top of understanding your configuration though. If you aren't you are 
>> better off using default of HTTP/1.0 for the proxy connections from nginx 
>> to Apache.
>>
>> Either way, the reason nginx helps is that when doing proxying, nginx can 
>> pre buffer up to a certain amount of request content and will only bother 
>> proxying a request to Apache, if request content is below the limit, when 
>> it has successfully received it all. Thus Apache will not get troubled by 
>> any requests which got dropped and so the IOError issue can be cut 
>> dramatically.
>>
>> In short, nginx helps to isolate Apache from slow HTTP clients and can 
>> make Apache perform better with less resources.
>>
>> And will these kind of requests keep in memory forever as it handled 
>> incorrectly and make the memory usage grow?
>>
>>
>> No they aren't held indefinitely.
>>
>> The problem with slow HTTP clients is when although no data is coming 
>> through, it still holds the connection open until a timeout occurs based on 
>> Timeout directive and then connection is dropped.
>>
>> What are you using for the Timeout directive?
>>
>> The compiled in default for Timeout is 60 seconds, but the sample 
>> configuration files often have 300 seconds. 300 seconds it way too high and 
>> for many situations 60 seconds is also too much, but you have to be a bit 
>> careful about dropping it too low.
>>
>> This again though is where nginx as a front end proxy helps, because the 
>> request would simply never get through to Apache if the content wasn't 
>> coming through and expected content was under the limit.
>>
>> Yes nginx still has to deal with the hung connection, but it is much more 
>> efficient at that than Apache as nginx uses an async event driven system to 
>> manage many connections in one thread where as Apache uses a thread per 
>> connection.
>>
>> So what happens is the following:
>>>
>>> 1. Apache graceful restart is triggered.
>>> 2. Apache parent process sends SIGUSR1 to Apache child worker process to 
>>> signal graceful shutdown.
>>> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to 
>>> signal shutdown.
>>> 4. The mod_wsgi daemon processes complete their requests and restart. In 
>>> the next incarnation of the mod_wsgi daemon processes after an Apache 
>>> restart they expect a different path for the proxy socket, with the number 
>>> at the end increasing based on Apache generation number.
>>> 5. The Apache child worker process because it was in a graceful restart 
>>> mode, operates on the understanding that it can keep handling any requests 
>>> on a keep alive socket connection from a HTTP client until there are no 
>>> more. It therefore takes next request on same connection and tries to 
>>> connect to mod_wsgi daemon process, but using the proxy socket name as was 
>>> used before, but that name has changed for the next Apache configuration 
>>> generation and no longer exists, thus it fails.
>>>
>>
>> Is there any ways to avoid Apache graceful restart? Apache graceful 
>> restart is triggered by the "MaxConnectionsPerChild" or other settings?
>> If so, is it better to control by "maximum-requests" in mod_wsgi setting?
>>
>>
>> The maximum-requests pertains for mod_wsgi daemon mode processes.
>>
>> This is where you have to correctly identify which processes are actually 
>> growing, the Apache child worker processes of the mod_wsgi daemon processes 
>> that your WSGI application is running in.
>>
>> If it is the one your WSGI application is in, there are many things you 
>> could be doing wrong which would cause memory usage to keep growing.
>>
>> You might even be encountering bugs in third party packages you use. 
>> Django for example until at least 1.6.? has had an issue with its signal 
>> mechanism that could result in deadlocks when garbage collection is being 
>> done. This could lock up a request thread, but then also cause the garbage 
>> collector to not run again. The result being that memory usage could keep 
>> growing and growing since as the garbage collector will never be able to 
>> reclaim objects.
>>
>> So the big question still is, which processes are the ones growing in 
>> memory usage? Only then can say what you really need to do and give 
>> suggestions on how to track it down.
>>
>> For reference, the Apache/mod_wsgi platform plugin is detailed at:
>>
>>     https://pypi.python.org/pypi/mod_wsgi-metrics/1.1.0
>>
>> The Django GC dead lock issue as it came to our intention, with links to 
>> Django bug reports, can be found at:
>>
>>     
>> https://discuss.newrelic.com/t/background-thread-slowly-leaks-memory/2170
>>
>> Graham
>>
>> Are you using any Apache modules for implementing caching in Apache?     
>> No. I just have application-level caching (ORM caching and memcache for 
>> some of the requests).
>>
>>
>> On Tuesday, October 28, 2014 6:21:22 PM UTC+8, Graham Dumpleton wrote:
>>>
>>>
>>> On 28/10/2014, at 5:18 PM, Kelvin Wong <[email protected]> wrote:
>>>
>>> Hi Graham,
>>>
>>> Thanks a lot for your detailed explanations.
>>>
>>> I used to reload the apache processes instead of restart them.
>>> So is there any relation to the "MaxConnectionsPerChild" setting that 
>>> when the process met the limit, it restart the child process?
>>>
>>>
>>> There shouldn't be as MaxConnectionsPerChild only pertains to the Apache 
>>> child worker processes and the number of connections. When a specific 
>>> Apache child worker process is restarted, it is in a form of graceful 
>>> restart, but the Apache configuration isn't being read by Apache as a whole 
>>> so the 'generation' counter inside Apache wouldn't change and so the name 
>>> of the proxy socket file wouldn't change either. So that option shouldn't 
>>> cause those errors.
>>>
>>> If so, any alternative to this setting? I used this setting to bound 
>>> the memory usage of apache.
>>>
>>>
>>> The issue is why you would be seeing memory growth in the Apache child 
>>> worker processes to start with and how much. By rights they shouldn't keep 
>>> increasing in memory usage. They can increase in memory a bit, but then 
>>> should plateau. For example:
>>>
>>>
>>> If using mod_wsgi daemon mode where the Apache child worker process are 
>>> only proxying requests or serving static files, this growth up to a ceiling 
>>> as reflected in 'Apache Child Process Memory (Average)' is generally the 
>>> result of the per worker thread memory pools that Apache uses.
>>>
>>> The problem is that there may not be a limit on the upper size of the 
>>> per worker thread memory pools and that is that the size is unbounded. This 
>>> is especially the case in Apache 2.2 as the compiled in default is 
>>> unlimited, so if the configuration file doesn't set it, then the ceiling 
>>> can grow to be quite high as it depends a lot on how much data may get 
>>> buffered in memory due to slow HTTP clients.
>>>
>>> In Apache 2.4 there is now at least a compiled in default, but it still 
>>> may be higher than desirable. In Apache 2.4 that default is:
>>>
>>> #define ALLOCATOR_MAX_FREE_DEFAULT (2048*1024)
>>>
>>> This means that for each worker thread in a process, the memory pool 
>>> associated with it can retain 2MB of memory. As you have 25 worker threads, 
>>> that means these memory pools can consume up to 50MB per worker process. 
>>> You then have up to 6 worker processes. So that is 300MB in worst case if 
>>> the throughput was enough to keep all the process active and Apache didn't 
>>> start killing them off as not needed. But then the number of processes will 
>>> not go all the way back to 1 due to MaxSpareThreads being 75, thus it will 
>>> always keep at least 3 processes around.
>>>
>>> Anyway, if memory usage in Apache child worker process is a big issue, 
>>> especially where you are actually delegating the WSGI application to run in 
>>> mod_wsgi daemon mode, meaning that the Apache child worker processes should 
>>> be able to be run quite lean, then you can adjust MaxMemFree down from the 
>>> quite high default in Apache 2.4 (and non existent in Apache 2.2). 
>>>
>>> There are two other changes you can also make related to memory usage in 
>>> the Apache child worker processes.
>>>
>>> The first is if you are always using mod_wsgi daemon mode and never 
>>> requiring embedded mode, then turn off initialisation of the Python 
>>> interpreter in the Apache child worker processes.
>>>
>>> The second is that on Linux the default per thread stack size is 8MB. 
>>> This much shouldn't usually be required and really only counts towards 
>>> virtual memory usage, but some VPS systems count virtual memory for billing 
>>> purposes so it can become a problem that way.
>>>
>>> So rather than thinking that MaxConnectionsPerChild is the only 
>>> solution, use directives to control how much memory may be getting used 
>>> and/or retained by the worker threads.
>>>
>>> In  mod_wsgi-express for example, the default generated configuration it 
>>> generates as a saner default where mod_wsgi daemon mode is always used is:
>>>
>>> # Turn off Python interpreter initialisation in Apache child worker 
>>> process as not required
>>> # if using mod_wsgi daemon mode exclusively. This will be overridden if 
>>> enabled use of
>>> # Python scripts for access control/authentication/authorisation which 
>>> have to run in the
>>> # Apache child worker processes.
>>>
>>> WSGIRestrictEmbedded On
>>>
>>> # Set a limit on the amount of memory which will be retained in per 
>>> worker memory pools.
>>> # More memory than this can still be used if need be, but when no longer 
>>> required and above
>>> # this limited it will be released back to the process level memory 
>>> allocated for reuse rather
>>> # that being retained for exclusive use by the thread, with a risk of a 
>>> higher memory level.
>>>
>>> MaxMemFree 64
>>>
>>> # Reduce the notional per thread stack size for all the worker threads. 
>>> This relates more to
>>> # virtual memory usage, but some VPS systems can virtual memory for 
>>> billing purposes.
>>>
>>> ThreadStackSize 262144
>>>
>>> Upgrading to mod_wsgi 4.3.0 will solve this problem? mod_wsgi 4.3.0 
>>> improved handling on segmentation fault error?
>>>
>>>
>>> As far as what was being discussed before, version 4.3.0 makes the error 
>>> messages more descriptive of the issue and introduces new messages where 
>>> they weren't able to be distinguished before. It wasn't possible to do this 
>>> as well before as Apache code was being relied on to handle reading back 
>>> data from mod_wsgi daemon processes. The mod_wsgi code now does this itself 
>>> and can better control things. In mod_wsgi version 4.4.0 if can get the 
>>> changes completed (more likely 4.5.0), there will be better messages again 
>>> for some things as error codes that were previously being thrown away by 
>>> Apache will actually be known.
>>>
>>> So it will not change how a segmentation fault is handled as that can't 
>>> be changed, just the wording of the error message when a mod_wsgi daemon 
>>> process may have died or was shutdown.
>>>
>>> I have thrown a lot of information at you here, but do you actually have 
>>> figures on what the memory usage of the Apache child worker processes grows 
>>> to? Are you using any Apache modules for implementing caching in Apache?
>>>
>>> Graham 
>>>
>>> On Tuesday, October 28, 2014 1:30:06 PM UTC+8, Graham Dumpleton wrote:
>>>>
>>>> Would suggest upgrading to mod_wsgi 4.3.0 if you can as the error 
>>>> messages when there are communication problems between Apache child worker 
>>>> process and mod_wsgi daemon process have been improved.
>>>>
>>>> More comments below.
>>>>
>>>> On 28 October 2014 15:43, Kelvin Wong <[email protected]> wrote:
>>>>
>>>>> Hi Graham and everyone else
>>>>>
>>>>> I'm running multiple site on Django 1.6.7, Apache/2.4.7 (Ubuntu 
>>>>> 14.04), OpenSSL/1.0.1f, mod_wsgi/4.2.5, Python/2.7.6, Server MPM: worker.
>>>>> I found that the server start returning 504 and then 503, and the 
>>>>> following error shown up.
>>>>> I researched some issues related with it, even added "WSGISocketPrefix 
>>>>> /var/run/apache2/wsgi", but the issue still occured.
>>>>> I have no idea why it happened. Can anyone give some directions on 
>>>>> this issue?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> apache error log
>>>>> [Sun Oct 26 07:34:34.732934 2014] [wsgi:error] [pid 29268:tid 
>>>>> 140053011478272] [client xx.xxx.xxx.xxx:xxxxx] Timeout when reading 
>>>>> response headers from daemon process 'site-1': /home/ubuntu/site-1/
>>>>> apache/wsgi.py
>>>>> [Sun Oct 26 07:34:37.198806 2014] [wsgi:error] [pid 27816:tid 
>>>>> 140052910765824] (11)Resource temporarily unavailable: [client xx.xx.
>>>>> xx.xx:xxxxx] mod_wsgi (pid=27816): Unable to connect to WSGI daemon 
>>>>> process 'site-1' on '/var/run/apache2/wsgi.17227.2.3.sock'.
>>>>>
>>>>
>>>> This one can occur when the mod_wsgi daemon process crashes. There 
>>>> should be a segmentation fault error message or similar in the main Apache 
>>>> error log (not VirtualHost specific log).
>>>>
>>>> It can also occur if there are incomplete requests still running when a 
>>>> mod_wsgi daemon process is shutdown on being restarted due to the WSGI 
>>>> script file being touched or if Apache was restarted. In the latter case, 
>>>> the mod_wsgi daemon process would have had to have been killed off by 
>>>> Apache before the Apache child worker process which was proxying it to 
>>>> had. 
>>>> This can especially be the case if an Apache graceful restart was being 
>>>> done. 
>>>>  
>>>>
>>>>> occasionally
>>>>> [Tue Oct 28 02:20:40.722140 2014] [wsgi:error] [pid 24158:tid 
>>>>> 140182690981632] (2)No such file or directory: [client 24.171.250.159:
>>>>> 60769] mod_wsgi (pid=24158): Unable to connect to WSGI daemon process 
>>>>> 'snaptee-production-api-ssl' on '/var/run/apache2/wsgi.30188.7.3.sock'
>>>>> .
>>>>>
>>>>
>>>> This can also be due to Apache graceful restart being done and there 
>>>> were keep alive connections being handled from a HTTP client. In an Apache 
>>>> graceful restart, because of Apache handles the mod_wsgi daemon processes, 
>>>> they don't have a graceful shutdown in the same way as Apache child worker 
>>>> processes.
>>>>
>>>> So what happens is the following:
>>>>
>>>> 1. Apache graceful restart is triggered.
>>>> 2. Apache parent process sends SIGUSR1 to Apache child worker process 
>>>> to signal graceful shutdown.
>>>> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to 
>>>> signal shutdown.
>>>> 4. The mod_wsgi daemon processes complete their requests and restart. 
>>>> In the next incarnation of the mod_wsgi daemon processes after an Apache 
>>>> restart they expect a different path for the proxy socket, with the number 
>>>> at the end increasing based on Apache generation number.
>>>> 5. The Apache child worker process because it was in a graceful restart 
>>>> mode, operates on the understanding that it can keep handling any requests 
>>>> on a keep alive socket connection from a HTTP client until there are no 
>>>> more. It therefore takes next request on same connection and tries to 
>>>> connect to mod_wsgi daemon process, but using the proxy socket name as was 
>>>> used before, but that name has changed for the next Apache configuration 
>>>> generation and no longer exists, thus it fails.
>>>>
>>>> The name of the proxy socket changes across Apache restarts because 
>>>> otherwise you could have Apache child worker processes under an old 
>>>> configuration sending requests to a mod_wsgi daemon process using the new 
>>>> configuration, which could cause problems including security issues. There 
>>>> are therefore specific protections in place to ensure that only Apache 
>>>> child worker processes and mod_wsgi daemon mode processes created against 
>>>> the same Apache configuration generation can talk to each other.
>>>>  
>>>>
>>>>> wsgi config for that site
>>>>> WSGIDaemonProcess site-1 display-name=site-1 user=www-data threads=25 
>>>>> python-path=/home/ubuntu/site-1/django:/home/ubuntu/.virtualenvs/site-
>>>>> 1/lib/python2.7/site-packages
>>>>> WSGIProcessGroup site-1
>>>>> WSGIApplicationGroup %{GLOBAL}
>>>>> WSGIScriptAlias / /home/ubuntu/site-1/apache/wsgi.py
>>>>>
>>>>> worker.conf
>>>>> <IfModule mpm_worker_module>
>>>>>        StartServers                 2
>>>>>        MinSpareThreads             25
>>>>>        MaxSpareThreads             75
>>>>>        ThreadLimit                 64
>>>>>        ThreadsPerChild             25
>>>>>        MaxRequestWorkers          150
>>>>>        MaxConnectionsPerChild    1000
>>>>> </IfModule>
>>>>>
>>>>
>>>> So my best guess is that you are doing Apache graceful restarts when 
>>>> these are occurring.
>>>>
>>>> Are you using Apache graceful restarts as suspected?
>>>>
>>>> Graham 
>>>>
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] mod_wsgi returning 503 service unavailable

Reply via email to