Re: [modwsgi] mod_wsgi returning 503 service unavailable

Graham Dumpleton Tue, 28 Oct 2014 19:37:00 -0700

On 28/10/2014, at 10:58 PM, Kelvin Wong <[email protected]> wrote:


> do you actually have figures on what the memory usage of the Apache child 
> worker processes grows to?
> 
> I do. I used New Relic to monitor the system resource usage. I found as time 
> goes, the apache processes take a lot of memory. That's why I want to control 
> the memory usage of the apache.

Okay, but where in New Relic are you monitoring this? I am concerned now as to 
whether you are even looking at just the Apache child worker processes that 
MaxConnectionsPerChild pertains to.

If you were looking at the host breakout chart on the overview dashboard for 
the WSGI application being monitoring by the Python web application agent, and 
you are using daemon mode, then what you are looking at is the memory taken by 
the mod_wsgi daemon processes and not the Apache child worker processes. As a 
consequence the MaxConnectionsPerChild directive doesn't apply.

If you were looking at the server monitoring charts and looking at the Apache 
httpd/apache2 process, then that is all processes under Apache, which counts 
both the Apache child worker processes and the mod_wsgi daemon processes. If 
you relied on those charts, you can't tell whether it is the Apache child 
processes or mod_wsgi daemon processes.

So you can from the Python web application agent or the server monitoring agent 
tell how much memory is just being used by the Apache child worker processes.

In the chart I included which can still see below, that is relying on a 
platform plugin agent for Apache/mod_wsgi. Unlike the others, it does pull out 
memory just for the Apache child worker processes. I then created a custom 
dashboard which includes charts for metrics from both the Python web 
application agent and the Apache/mod_wsgi platform plugin so can cross compare 
them. That is how I got all the charts I showed.

So right now I am question whether you should be using MaxConnectionsPerChild 
as it is more likely that you may be looking at the size of the mod_wsgi daemon 
processes which actually contain your WSGI application.

> Also, my application is mainly apis for mobile application which involved 
> uploading files/images. I found that there are a lot of IOError occurred as 
> seemed the upload is unexpected terminated by the mobile application.
> Do you have any suggestions on this?

You can't stop connections being dropped, especially with mobile agents.

What size are the images?

One thing you can do and which is actually a good idea overall independent of 
your specific situation, is to place a nginx front end proxy in front of 
Apache/mod_wsgi. The preferable way of configuring nginx is to have it use 
HTTP/1.1 and keep alive connections to Apache. You have to be on top of 
understanding your configuration though. If you aren't you are better off using 
default of HTTP/1.0 for the proxy connections from nginx to Apache.

Either way, the reason nginx helps is that when doing proxying, nginx can pre 
buffer up to a certain amount of request content and will only bother proxying 
a request to Apache, if request content is below the limit, when it has 
successfully received it all. Thus Apache will not get troubled by any requests 
which got dropped and so the IOError issue can be cut dramatically.

In short, nginx helps to isolate Apache from slow HTTP clients and can make 
Apache perform better with less resources.

> And will these kind of requests keep in memory forever as it handled 
> incorrectly and make the memory usage grow?

No they aren't held indefinitely.

The problem with slow HTTP clients is when although no data is coming through, 
it still holds the connection open until a timeout occurs based on Timeout 
directive and then connection is dropped.

What are you using for the Timeout directive?

The compiled in default for Timeout is 60 seconds, but the sample configuration 
files often have 300 seconds. 300 seconds it way too high and for many 
situations 60 seconds is also too much, but you have to be a bit careful about 
dropping it too low.

This again though is where nginx as a front end proxy helps, because the 
request would simply never get through to Apache if the content wasn't coming 
through and expected content was under the limit.

Yes nginx still has to deal with the hung connection, but it is much more 
efficient at that than Apache as nginx uses an async event driven system to 
manage many connections in one thread where as Apache uses a thread per 
connection.

> So what happens is the following:
> 
> 1. Apache graceful restart is triggered.
> 2. Apache parent process sends SIGUSR1 to Apache child worker process to 
> signal graceful shutdown.
> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to signal 
> shutdown.
> 4. The mod_wsgi daemon processes complete their requests and restart. In the 
> next incarnation of the mod_wsgi daemon processes after an Apache restart 
> they expect a different path for the proxy socket, with the number at the end 
> increasing based on Apache generation number.
> 5. The Apache child worker process because it was in a graceful restart mode, 
> operates on the understanding that it can keep handling any requests on a 
> keep alive socket connection from a HTTP client until there are no more. It 
> therefore takes next request on same connection and tries to connect to 
> mod_wsgi daemon process, but using the proxy socket name as was used before, 
> but that name has changed for the next Apache configuration generation and no 
> longer exists, thus it fails.
> 
> Is there any ways to avoid Apache graceful restart? Apache graceful restart 
> is triggered by the "MaxConnectionsPerChild" or other settings?
> If so, is it better to control by "maximum-requests" in mod_wsgi setting?

The maximum-requests pertains for mod_wsgi daemon mode processes.

This is where you have to correctly identify which processes are actually 
growing, the Apache child worker processes of the mod_wsgi daemon processes 
that your WSGI application is running in.

If it is the one your WSGI application is in, there are many things you could 
be doing wrong which would cause memory usage to keep growing.

You might even be encountering bugs in third party packages you use. Django for 
example until at least 1.6.? has had an issue with its signal mechanism that 
could result in deadlocks when garbage collection is being done. This could 
lock up a request thread, but then also cause the garbage collector to not run 
again. The result being that memory usage could keep growing and growing since 
as the garbage collector will never be able to reclaim objects.

So the big question still is, which processes are the ones growing in memory 
usage? Only then can say what you really need to do and give suggestions on how 
to track it down.

For reference, the Apache/mod_wsgi platform plugin is detailed at:

    https://pypi.python.org/pypi/mod_wsgi-metrics/1.1.0

The Django GC dead lock issue as it came to our intention, with links to Django 
bug reports, can be found at:

    https://discuss.newrelic.com/t/background-thread-slowly-leaks-memory/2170

Graham

> Are you using any Apache modules for implementing caching in Apache?     No. 
> I just have application-level caching (ORM caching and memcache for some of 
> the requests).
> 
> 
> On Tuesday, October 28, 2014 6:21:22 PM UTC+8, Graham Dumpleton wrote:
> 
> On 28/10/2014, at 5:18 PM, Kelvin Wong <[email protected]> wrote:
> 
>> Hi Graham,
>> 
>> Thanks a lot for your detailed explanations.
>> 
>> I used to reload the apache processes instead of restart them.
>> So is there any relation to the "MaxConnectionsPerChild" setting that when 
>> the process met the limit, it restart the child process?
> 
> There shouldn't be as MaxConnectionsPerChild only pertains to the Apache 
> child worker processes and the number of connections. When a specific Apache 
> child worker process is restarted, it is in a form of graceful restart, but 
> the Apache configuration isn't being read by Apache as a whole so the 
> 'generation' counter inside Apache wouldn't change and so the name of the 
> proxy socket file wouldn't change either. So that option shouldn't cause 
> those errors.
> 
>> If so, any alternative to this setting? I used this setting to bound the 
>> memory usage of apache.
> 
> The issue is why you would be seeing memory growth in the Apache child worker 
> processes to start with and how much. By rights they shouldn't keep 
> increasing in memory usage. They can increase in memory a bit, but then 
> should plateau. For example:
> 
> 
> 
> If using mod_wsgi daemon mode where the Apache child worker process are only 
> proxying requests or serving static files, this growth up to a ceiling as 
> reflected in 'Apache Child Process Memory (Average)' is generally the result 
> of the per worker thread memory pools that Apache uses.
> 
> The problem is that there may not be a limit on the upper size of the per 
> worker thread memory pools and that is that the size is unbounded. This is 
> especially the case in Apache 2.2 as the compiled in default is unlimited, so 
> if the configuration file doesn't set it, then the ceiling can grow to be 
> quite high as it depends a lot on how much data may get buffered in memory 
> due to slow HTTP clients.
> 
> In Apache 2.4 there is now at least a compiled in default, but it still may 
> be higher than desirable. In Apache 2.4 that default is:
> 
> #define ALLOCATOR_MAX_FREE_DEFAULT (2048*1024)
> 
> This means that for each worker thread in a process, the memory pool 
> associated with it can retain 2MB of memory. As you have 25 worker threads, 
> that means these memory pools can consume up to 50MB per worker process. You 
> then have up to 6 worker processes. So that is 300MB in worst case if the 
> throughput was enough to keep all the process active and Apache didn't start 
> killing them off as not needed. But then the number of processes will not go 
> all the way back to 1 due to MaxSpareThreads being 75, thus it will always 
> keep at least 3 processes around.
> 
> Anyway, if memory usage in Apache child worker process is a big issue, 
> especially where you are actually delegating the WSGI application to run in 
> mod_wsgi daemon mode, meaning that the Apache child worker processes should 
> be able to be run quite lean, then you can adjust MaxMemFree down from the 
> quite high default in Apache 2.4 (and non existent in Apache 2.2). 
> 
> There are two other changes you can also make related to memory usage in the 
> Apache child worker processes.
> 
> The first is if you are always using mod_wsgi daemon mode and never requiring 
> embedded mode, then turn off initialisation of the Python interpreter in the 
> Apache child worker processes.
> 
> The second is that on Linux the default per thread stack size is 8MB. This 
> much shouldn't usually be required and really only counts towards virtual 
> memory usage, but some VPS systems count virtual memory for billing purposes 
> so it can become a problem that way.
> 
> So rather than thinking that MaxConnectionsPerChild is the only solution, use 
> directives to control how much memory may be getting used and/or retained by 
> the worker threads.
> 
> In  mod_wsgi-express for example, the default generated configuration it 
> generates as a saner default where mod_wsgi daemon mode is always used is:
> 
> # Turn off Python interpreter initialisation in Apache child worker process 
> as not required
> # if using mod_wsgi daemon mode exclusively. This will be overridden if 
> enabled use of
> # Python scripts for access control/authentication/authorisation which have 
> to run in the
> # Apache child worker processes.
> 
> WSGIRestrictEmbedded On
> 
> # Set a limit on the amount of memory which will be retained in per worker 
> memory pools.
> # More memory than this can still be used if need be, but when no longer 
> required and above
> # this limited it will be released back to the process level memory allocated 
> for reuse rather
> # that being retained for exclusive use by the thread, with a risk of a 
> higher memory level.
> 
> MaxMemFree 64
> 
> # Reduce the notional per thread stack size for all the worker threads. This 
> relates more to
> # virtual memory usage, but some VPS systems can virtual memory for billing 
> purposes.
> 
> ThreadStackSize 262144
> 
>> Upgrading to mod_wsgi 4.3.0 will solve this problem? mod_wsgi 4.3.0 improved 
>> handling on segmentation fault error?
> 
> As far as what was being discussed before, version 4.3.0 makes the error 
> messages more descriptive of the issue and introduces new messages where they 
> weren't able to be distinguished before. It wasn't possible to do this as 
> well before as Apache code was being relied on to handle reading back data 
> from mod_wsgi daemon processes. The mod_wsgi code now does this itself and 
> can better control things. In mod_wsgi version 4.4.0 if can get the changes 
> completed (more likely 4.5.0), there will be better messages again for some 
> things as error codes that were previously being thrown away by Apache will 
> actually be known.
> 
> So it will not change how a segmentation fault is handled as that can't be 
> changed, just the wording of the error message when a mod_wsgi daemon process 
> may have died or was shutdown.
> 
> I have thrown a lot of information at you here, but do you actually have 
> figures on what the memory usage of the Apache child worker processes grows 
> to? Are you using any Apache modules for implementing caching in Apache?
> 
> Graham 
> 
>> On Tuesday, October 28, 2014 1:30:06 PM UTC+8, Graham Dumpleton wrote:
>> Would suggest upgrading to mod_wsgi 4.3.0 if you can as the error messages 
>> when there are communication problems between Apache child worker process 
>> and mod_wsgi daemon process have been improved.
>> 
>> More comments below.
>> 
>> On 28 October 2014 15:43, Kelvin Wong <[email protected]> wrote:
>> Hi Graham and everyone else
>> 
>> I'm running multiple site on Django 1.6.7, Apache/2.4.7 (Ubuntu 14.04), 
>> OpenSSL/1.0.1f, mod_wsgi/4.2.5, Python/2.7.6, Server MPM: worker.
>> I found that the server start returning 504 and then 503, and the following 
>> error shown up.
>> I researched some issues related with it, even added "WSGISocketPrefix 
>> /var/run/apache2/wsgi", but the issue still occured.
>> I have no idea why it happened. Can anyone give some directions on this 
>> issue?
>> 
>> Thanks!
>> 
>> apache error log
>> [Sun Oct 26 07:34:34.732934 2014] [wsgi:error] [pid 29268:tid 
>> 140053011478272] [client xx.xxx.xxx.xxx:xxxxx] Timeout when reading response 
>> headers from daemon process 'site-1': /home/ubuntu/site-1/apache/wsgi.py
>> [Sun Oct 26 07:34:37.198806 2014] [wsgi:error] [pid 27816:tid 
>> 140052910765824] (11)Resource temporarily unavailable: [client 
>> xx.xx.xx.xx:xxxxx] mod_wsgi (pid=27816): Unable to connect to WSGI daemon 
>> process 'site-1' on '/var/run/apache2/wsgi.17227.2.3.sock'.
>> 
>> This one can occur when the mod_wsgi daemon process crashes. There should be 
>> a segmentation fault error message or similar in the main Apache error log 
>> (not VirtualHost specific log).
>> 
>> It can also occur if there are incomplete requests still running when a 
>> mod_wsgi daemon process is shutdown on being restarted due to the WSGI 
>> script file being touched or if Apache was restarted. In the latter case, 
>> the mod_wsgi daemon process would have had to have been killed off by Apache 
>> before the Apache child worker process which was proxying it to had. This 
>> can especially be the case if an Apache graceful restart was being done. 
>>  
>> occasionally
>> [Tue Oct 28 02:20:40.722140 2014] [wsgi:error] [pid 24158:tid 
>> 140182690981632] (2)No such file or directory: [client 24.171.250.159:60769] 
>> mod_wsgi (pid=24158): Unable to connect to WSGI daemon process 
>> 'snaptee-production-api-ssl' on '/var/run/apache2/wsgi.30188.7.3.sock'.
>> 
>> This can also be due to Apache graceful restart being done and there were 
>> keep alive connections being handled from a HTTP client. In an Apache 
>> graceful restart, because of Apache handles the mod_wsgi daemon processes, 
>> they don't have a graceful shutdown in the same way as Apache child worker 
>> processes.
>> 
>> So what happens is the following:
>> 
>> 1. Apache graceful restart is triggered.
>> 2. Apache parent process sends SIGUSR1 to Apache child worker process to 
>> signal graceful shutdown.
>> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to signal 
>> shutdown.
>> 4. The mod_wsgi daemon processes complete their requests and restart. In the 
>> next incarnation of the mod_wsgi daemon processes after an Apache restart 
>> they expect a different path for the proxy socket, with the number at the 
>> end increasing based on Apache generation number.
>> 5. The Apache child worker process because it was in a graceful restart 
>> mode, operates on the understanding that it can keep handling any requests 
>> on a keep alive socket connection from a HTTP client until there are no 
>> more. It therefore takes next request on same connection and tries to 
>> connect to mod_wsgi daemon process, but using the proxy socket name as was 
>> used before, but that name has changed for the next Apache configuration 
>> generation and no longer exists, thus it fails.
>> 
>> The name of the proxy socket changes across Apache restarts because 
>> otherwise you could have Apache child worker processes under an old 
>> configuration sending requests to a mod_wsgi daemon process using the new 
>> configuration, which could cause problems including security issues. There 
>> are therefore specific protections in place to ensure that only Apache child 
>> worker processes and mod_wsgi daemon mode processes created against the same 
>> Apache configuration generation can talk to each other.
>>  
>> wsgi config for that site
>> WSGIDaemonProcess site-1 display-name=site-1 user=www-data threads=25 
>> python-path=/home/ubuntu/site-1/django:/home/ubuntu/.virtualenvs/site-1/lib/python2.7/site-packages
>> WSGIProcessGroup site-1
>> WSGIApplicationGroup %{GLOBAL}
>> WSGIScriptAlias / /home/ubuntu/site-1/apache/wsgi.py
>> 
>> worker.conf
>> <IfModule mpm_worker_module>
>>        StartServers                 2
>>        MinSpareThreads             25
>>        MaxSpareThreads             75
>>        ThreadLimit                 64
>>        ThreadsPerChild             25
>>        MaxRequestWorkers          150
>>        MaxConnectionsPerChild    1000
>> </IfModule>
>> 
>> So my best guess is that you are doing Apache graceful restarts when these 
>> are occurring.
>> 
>> Are you using Apache graceful restarts as suspected?
>> 
>> Graham 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] mod_wsgi returning 503 service unavailable

Reply via email to