Re: [modwsgi] mod_wsgi returning 503 service unavailable

Kelvin Wong Wed, 29 Oct 2014 00:54:07 -0700

I just setup the Apache/mod_wsgi platform plugin to collect the resource 
usage of mod_wsgi.
It needs some time to collect the data in production. I will get back to 
you when I have some insights.


Thanks for your help. :)


On Wednesday, October 29, 2014 10:36:13 AM UTC+8, Graham Dumpleton wrote:
>
>
> On 28/10/2014, at 10:58 PM, Kelvin Wong <[email protected] <javascript:>> 
> wrote:
>
> do you actually have figures on what the memory usage of the Apache child 
> worker processes grows to?
>
> I do. I used New Relic to monitor the system resource usage. I found as 
> time goes, the apache processes take a lot of memory. That's why I want to 
> control the memory usage of the apache.
>
>
> Okay, but where in New Relic are you monitoring this? I am concerned now 
> as to whether you are even looking at just the Apache child worker 
> processes that MaxConnectionsPerChild pertains to.
>
> If you were looking at the host breakout chart on the overview dashboard 
> for the WSGI application being monitoring by the Python web application 
> agent, and you are using daemon mode, then what you are looking at is the 
> memory taken by the mod_wsgi daemon processes and not the Apache child 
> worker processes. As a consequence the MaxConnectionsPerChild directive 
> doesn't apply.
>
> If you were looking at the server monitoring charts and looking at the 
> Apache httpd/apache2 process, then that is all processes under Apache, 
> which counts both the Apache child worker processes and the mod_wsgi daemon 
> processes. If you relied on those charts, you can't tell whether it is the 
> Apache child processes or mod_wsgi daemon processes.
>
> So you can from the Python web application agent or the server monitoring 
> agent tell how much memory is just being used by the Apache child worker 
> processes.
>
> In the chart I included which can still see below, that is relying on a 
> platform plugin agent for Apache/mod_wsgi. Unlike the others, it does pull 
> out memory just for the Apache child worker processes. I then created a 
> custom dashboard which includes charts for metrics from both the Python web 
> application agent and the Apache/mod_wsgi platform plugin so can cross 
> compare them. That is how I got all the charts I showed.
>
> So right now I am question whether you should be 
> using MaxConnectionsPerChild as it is more likely that you may be looking 
> at the size of the mod_wsgi daemon processes which actually contain your 
> WSGI application.
>
> Also, my application is mainly apis for mobile application which involved 
> uploading files/images. I found that there are a lot of IOError occurred as 
> seemed the upload is unexpected terminated by the mobile application.
> Do you have any suggestions on this?
>
>
> You can't stop connections being dropped, especially with mobile agents.
>
> What size are the images?
>
> One thing you can do and which is actually a good idea overall independent 
> of your specific situation, is to place a nginx front end proxy in front of 
> Apache/mod_wsgi. The preferable way of configuring nginx is to have it use 
> HTTP/1.1 and keep alive connections to Apache. You have to be on top of 
> understanding your configuration though. If you aren't you are better off 
> using default of HTTP/1.0 for the proxy connections from nginx to Apache.
>
> Either way, the reason nginx helps is that when doing proxying, nginx can 
> pre buffer up to a certain amount of request content and will only bother 
> proxying a request to Apache, if request content is below the limit, when 
> it has successfully received it all. Thus Apache will not get troubled by 
> any requests which got dropped and so the IOError issue can be cut 
> dramatically.
>
> In short, nginx helps to isolate Apache from slow HTTP clients and can 
> make Apache perform better with less resources.
>
> And will these kind of requests keep in memory forever as it handled 
> incorrectly and make the memory usage grow?
>
>
> No they aren't held indefinitely.
>
> The problem with slow HTTP clients is when although no data is coming 
> through, it still holds the connection open until a timeout occurs based on 
> Timeout directive and then connection is dropped.
>
> What are you using for the Timeout directive?
>
> The compiled in default for Timeout is 60 seconds, but the sample 
> configuration files often have 300 seconds. 300 seconds it way too high and 
> for many situations 60 seconds is also too much, but you have to be a bit 
> careful about dropping it too low.
>
> This again though is where nginx as a front end proxy helps, because the 
> request would simply never get through to Apache if the content wasn't 
> coming through and expected content was under the limit.
>
> Yes nginx still has to deal with the hung connection, but it is much more 
> efficient at that than Apache as nginx uses an async event driven system to 
> manage many connections in one thread where as Apache uses a thread per 
> connection.
>
> So what happens is the following:
>>
>> 1. Apache graceful restart is triggered.
>> 2. Apache parent process sends SIGUSR1 to Apache child worker process to 
>> signal graceful shutdown.
>> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to 
>> signal shutdown.
>> 4. The mod_wsgi daemon processes complete their requests and restart. In 
>> the next incarnation of the mod_wsgi daemon processes after an Apache 
>> restart they expect a different path for the proxy socket, with the number 
>> at the end increasing based on Apache generation number.
>> 5. The Apache child worker process because it was in a graceful restart 
>> mode, operates on the understanding that it can keep handling any requests 
>> on a keep alive socket connection from a HTTP client until there are no 
>> more. It therefore takes next request on same connection and tries to 
>> connect to mod_wsgi daemon process, but using the proxy socket name as was 
>> used before, but that name has changed for the next Apache configuration 
>> generation and no longer exists, thus it fails.
>>
>
> Is there any ways to avoid Apache graceful restart? Apache graceful 
> restart is triggered by the "MaxConnectionsPerChild" or other settings?
> If so, is it better to control by "maximum-requests" in mod_wsgi setting?
>
>
> The maximum-requests pertains for mod_wsgi daemon mode processes.
>
> This is where you have to correctly identify which processes are actually 
> growing, the Apache child worker processes of the mod_wsgi daemon processes 
> that your WSGI application is running in.
>
> If it is the one your WSGI application is in, there are many things you 
> could be doing wrong which would cause memory usage to keep growing.
>
> You might even be encountering bugs in third party packages you use. 
> Django for example until at least 1.6.? has had an issue with its signal 
> mechanism that could result in deadlocks when garbage collection is being 
> done. This could lock up a request thread, but then also cause the garbage 
> collector to not run again. The result being that memory usage could keep 
> growing and growing since as the garbage collector will never be able to 
> reclaim objects.
>
> So the big question still is, which processes are the ones growing in 
> memory usage? Only then can say what you really need to do and give 
> suggestions on how to track it down.
>
> For reference, the Apache/mod_wsgi platform plugin is detailed at:
>
>     https://pypi.python.org/pypi/mod_wsgi-metrics/1.1.0
>
> The Django GC dead lock issue as it came to our intention, with links to 
> Django bug reports, can be found at:
>
>     
> https://discuss.newrelic.com/t/background-thread-slowly-leaks-memory/2170
>
> Graham
>
> Are you using any Apache modules for implementing caching in Apache?     
> No. I just have application-level caching (ORM caching and memcache for 
> some of the requests).
>
>
> On Tuesday, October 28, 2014 6:21:22 PM UTC+8, Graham Dumpleton wrote:
>>
>>
>> On 28/10/2014, at 5:18 PM, Kelvin Wong <[email protected]> wrote:
>>
>> Hi Graham,
>>
>> Thanks a lot for your detailed explanations.
>>
>> I used to reload the apache processes instead of restart them.
>> So is there any relation to the "MaxConnectionsPerChild" setting that 
>> when the process met the limit, it restart the child process?
>>
>>
>> There shouldn't be as MaxConnectionsPerChild only pertains to the Apache 
>> child worker processes and the number of connections. When a specific 
>> Apache child worker process is restarted, it is in a form of graceful 
>> restart, but the Apache configuration isn't being read by Apache as a whole 
>> so the 'generation' counter inside Apache wouldn't change and so the name 
>> of the proxy socket file wouldn't change either. So that option shouldn't 
>> cause those errors.
>>
>> If so, any alternative to this setting? I used this setting to bound the 
>> memory usage of apache.
>>
>>
>> The issue is why you would be seeing memory growth in the Apache child 
>> worker processes to start with and how much. By rights they shouldn't keep 
>> increasing in memory usage. They can increase in memory a bit, but then 
>> should plateau. For example:
>>
>>
>> If using mod_wsgi daemon mode where the Apache child worker process are 
>> only proxying requests or serving static files, this growth up to a ceiling 
>> as reflected in 'Apache Child Process Memory (Average)' is generally the 
>> result of the per worker thread memory pools that Apache uses.
>>
>> The problem is that there may not be a limit on the upper size of the per 
>> worker thread memory pools and that is that the size is unbounded. This is 
>> especially the case in Apache 2.2 as the compiled in default is unlimited, 
>> so if the configuration file doesn't set it, then the ceiling can grow to 
>> be quite high as it depends a lot on how much data may get buffered in 
>> memory due to slow HTTP clients.
>>
>> In Apache 2.4 there is now at least a compiled in default, but it still 
>> may be higher than desirable. In Apache 2.4 that default is:
>>
>> #define ALLOCATOR_MAX_FREE_DEFAULT (2048*1024)
>>
>> This means that for each worker thread in a process, the memory pool 
>> associated with it can retain 2MB of memory. As you have 25 worker threads, 
>> that means these memory pools can consume up to 50MB per worker process. 
>> You then have up to 6 worker processes. So that is 300MB in worst case if 
>> the throughput was enough to keep all the process active and Apache didn't 
>> start killing them off as not needed. But then the number of processes will 
>> not go all the way back to 1 due to MaxSpareThreads being 75, thus it will 
>> always keep at least 3 processes around.
>>
>> Anyway, if memory usage in Apache child worker process is a big issue, 
>> especially where you are actually delegating the WSGI application to run in 
>> mod_wsgi daemon mode, meaning that the Apache child worker processes should 
>> be able to be run quite lean, then you can adjust MaxMemFree down from the 
>> quite high default in Apache 2.4 (and non existent in Apache 2.2). 
>>
>> There are two other changes you can also make related to memory usage in 
>> the Apache child worker processes.
>>
>> The first is if you are always using mod_wsgi daemon mode and never 
>> requiring embedded mode, then turn off initialisation of the Python 
>> interpreter in the Apache child worker processes.
>>
>> The second is that on Linux the default per thread stack size is 8MB. 
>> This much shouldn't usually be required and really only counts towards 
>> virtual memory usage, but some VPS systems count virtual memory for billing 
>> purposes so it can become a problem that way.
>>
>> So rather than thinking that MaxConnectionsPerChild is the only solution, 
>> use directives to control how much memory may be getting used and/or 
>> retained by the worker threads.
>>
>> In  mod_wsgi-express for example, the default generated configuration it 
>> generates as a saner default where mod_wsgi daemon mode is always used is:
>>
>> # Turn off Python interpreter initialisation in Apache child worker 
>> process as not required
>> # if using mod_wsgi daemon mode exclusively. This will be overridden if 
>> enabled use of
>> # Python scripts for access control/authentication/authorisation which 
>> have to run in the
>> # Apache child worker processes.
>>
>> WSGIRestrictEmbedded On
>>
>> # Set a limit on the amount of memory which will be retained in per 
>> worker memory pools.
>> # More memory than this can still be used if need be, but when no longer 
>> required and above
>> # this limited it will be released back to the process level memory 
>> allocated for reuse rather
>> # that being retained for exclusive use by the thread, with a risk of a 
>> higher memory level.
>>
>> MaxMemFree 64
>>
>> # Reduce the notional per thread stack size for all the worker threads. 
>> This relates more to
>> # virtual memory usage, but some VPS systems can virtual memory for 
>> billing purposes.
>>
>> ThreadStackSize 262144
>>
>> Upgrading to mod_wsgi 4.3.0 will solve this problem? mod_wsgi 4.3.0 
>> improved handling on segmentation fault error?
>>
>>
>> As far as what was being discussed before, version 4.3.0 makes the error 
>> messages more descriptive of the issue and introduces new messages where 
>> they weren't able to be distinguished before. It wasn't possible to do this 
>> as well before as Apache code was being relied on to handle reading back 
>> data from mod_wsgi daemon processes. The mod_wsgi code now does this itself 
>> and can better control things. In mod_wsgi version 4.4.0 if can get the 
>> changes completed (more likely 4.5.0), there will be better messages again 
>> for some things as error codes that were previously being thrown away by 
>> Apache will actually be known.
>>
>> So it will not change how a segmentation fault is handled as that can't 
>> be changed, just the wording of the error message when a mod_wsgi daemon 
>> process may have died or was shutdown.
>>
>> I have thrown a lot of information at you here, but do you actually have 
>> figures on what the memory usage of the Apache child worker processes grows 
>> to? Are you using any Apache modules for implementing caching in Apache?
>>
>> Graham 
>>
>> On Tuesday, October 28, 2014 1:30:06 PM UTC+8, Graham Dumpleton wrote:
>>>
>>> Would suggest upgrading to mod_wsgi 4.3.0 if you can as the error 
>>> messages when there are communication problems between Apache child worker 
>>> process and mod_wsgi daemon process have been improved.
>>>
>>> More comments below.
>>>
>>> On 28 October 2014 15:43, Kelvin Wong <[email protected]> wrote:
>>>
>>>> Hi Graham and everyone else
>>>>
>>>> I'm running multiple site on Django 1.6.7, Apache/2.4.7 (Ubuntu 14.04), 
>>>> OpenSSL/1.0.1f, mod_wsgi/4.2.5, Python/2.7.6, Server MPM: worker.
>>>> I found that the server start returning 504 and then 503, and the 
>>>> following error shown up.
>>>> I researched some issues related with it, even added "WSGISocketPrefix 
>>>> /var/run/apache2/wsgi", but the issue still occured.
>>>> I have no idea why it happened. Can anyone give some directions on this 
>>>> issue?
>>>>
>>>> Thanks!
>>>>
>>>> apache error log
>>>> [Sun Oct 26 07:34:34.732934 2014] [wsgi:error] [pid 29268:tid 
>>>> 140053011478272] [client xx.xxx.xxx.xxx:xxxxx] Timeout when reading 
>>>> response headers from daemon process 'site-1': /home/ubuntu/site-1/
>>>> apache/wsgi.py
>>>> [Sun Oct 26 07:34:37.198806 2014] [wsgi:error] [pid 27816:tid 
>>>> 140052910765824] (11)Resource temporarily unavailable: [client xx.xx.xx
>>>> .xx:xxxxx] mod_wsgi (pid=27816): Unable to connect to WSGI daemon 
>>>> process 'site-1' on '/var/run/apache2/wsgi.17227.2.3.sock'.
>>>>
>>>
>>> This one can occur when the mod_wsgi daemon process crashes. There 
>>> should be a segmentation fault error message or similar in the main Apache 
>>> error log (not VirtualHost specific log).
>>>
>>> It can also occur if there are incomplete requests still running when a 
>>> mod_wsgi daemon process is shutdown on being restarted due to the WSGI 
>>> script file being touched or if Apache was restarted. In the latter case, 
>>> the mod_wsgi daemon process would have had to have been killed off by 
>>> Apache before the Apache child worker process which was proxying it to had. 
>>> This can especially be the case if an Apache graceful restart was being 
>>> done. 
>>>  
>>>
>>>> occasionally
>>>> [Tue Oct 28 02:20:40.722140 2014] [wsgi:error] [pid 24158:tid 
>>>> 140182690981632] (2)No such file or directory: [client 24.171.250.159:
>>>> 60769] mod_wsgi (pid=24158): Unable to connect to WSGI daemon process 
>>>> 'snaptee-production-api-ssl' on '/var/run/apache2/wsgi.30188.7.3.sock'.
>>>>
>>>
>>> This can also be due to Apache graceful restart being done and there 
>>> were keep alive connections being handled from a HTTP client. In an Apache 
>>> graceful restart, because of Apache handles the mod_wsgi daemon processes, 
>>> they don't have a graceful shutdown in the same way as Apache child worker 
>>> processes.
>>>
>>> So what happens is the following:
>>>
>>> 1. Apache graceful restart is triggered.
>>> 2. Apache parent process sends SIGUSR1 to Apache child worker process to 
>>> signal graceful shutdown.
>>> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to 
>>> signal shutdown.
>>> 4. The mod_wsgi daemon processes complete their requests and restart. In 
>>> the next incarnation of the mod_wsgi daemon processes after an Apache 
>>> restart they expect a different path for the proxy socket, with the number 
>>> at the end increasing based on Apache generation number.
>>> 5. The Apache child worker process because it was in a graceful restart 
>>> mode, operates on the understanding that it can keep handling any requests 
>>> on a keep alive socket connection from a HTTP client until there are no 
>>> more. It therefore takes next request on same connection and tries to 
>>> connect to mod_wsgi daemon process, but using the proxy socket name as was 
>>> used before, but that name has changed for the next Apache configuration 
>>> generation and no longer exists, thus it fails.
>>>
>>> The name of the proxy socket changes across Apache restarts because 
>>> otherwise you could have Apache child worker processes under an old 
>>> configuration sending requests to a mod_wsgi daemon process using the new 
>>> configuration, which could cause problems including security issues. There 
>>> are therefore specific protections in place to ensure that only Apache 
>>> child worker processes and mod_wsgi daemon mode processes created against 
>>> the same Apache configuration generation can talk to each other.
>>>  
>>>
>>>> wsgi config for that site
>>>> WSGIDaemonProcess site-1 display-name=site-1 user=www-data threads=25 
>>>> python-path=/home/ubuntu/site-1/django:/home/ubuntu/.virtualenvs/site-1
>>>> /lib/python2.7/site-packages
>>>> WSGIProcessGroup site-1
>>>> WSGIApplicationGroup %{GLOBAL}
>>>> WSGIScriptAlias / /home/ubuntu/site-1/apache/wsgi.py
>>>>
>>>> worker.conf
>>>> <IfModule mpm_worker_module>
>>>>        StartServers                 2
>>>>        MinSpareThreads             25
>>>>        MaxSpareThreads             75
>>>>        ThreadLimit                 64
>>>>        ThreadsPerChild             25
>>>>        MaxRequestWorkers          150
>>>>        MaxConnectionsPerChild    1000
>>>> </IfModule>
>>>>
>>>
>>> So my best guess is that you are doing Apache graceful restarts when 
>>> these are occurring.
>>>
>>> Are you using Apache graceful restarts as suspected?
>>>
>>> Graham 
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] <javascript:>
> .
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] mod_wsgi returning 503 service unavailable

Reply via email to