Re: [modwsgi] mod_wsgi returning 503 service unavailable

Graham Dumpleton Tue, 28 Oct 2014 04:40:19 -0700

On 28/10/2014, at 10:07 PM, Graham Dumpleton <[email protected]> wrote:


> One further comment about those per worker thread memory pools. It isn't an 
> issue for you as you are using Apache 2.4, but Apache 2.2 in particular may 
> be susceptible to further problems with them when doing proxying, as is the 
> case with mod_wsgi daemon mode. This was exacerbated due to the unbounded 
> limit.
> 
> To understand the problem, I will explain what happens in Apache 2.4 when 
> proxying.
> 
> So when a request is received by Apache it is accepted by the Apache child 
> worker process. It is worked out it needs to be handled by the WSGI 
> application in a mod_wsgi daemon process. The request details (headers) are 
> then proxied across to the mod_wsgi daemon process. There is a little 
> handshake done to see if the mod_wsgi daemon process will accept it, or 
> whether something has changed and the mod_wsgi daemon process wants to do a 
> restart at that time. This might occur where the WSGI script file was touched.
> 
> If the mod_wsgi daemon process is going to accept the request, all the 
> request content is then sent across to the mod_wsgi daemon process. The WSGI 
> application should be reading this content. Only once the request content has 
> all been sent, will the Apache child worker process start looking for 
> response headers and content from the mod_wsgi daemon process.
> 
> Now, when dealing with this response, the Apache child worker process will 
> simply go into a loop of reading data from the mod_wsgi daemon process and 
> writing it back out to the client. In writing back data to the HTTP client 
> though, it isn't direct onto the socket connection. Instead it is via Apaches 
> output filter stack where various transformations may be applied. When it 
> final gets down to the lowest output filter, which is the core output filter, 
> if when writing data onto the socket it determines that the write would 
> block, rather than block it will buffer the data and return. This allows what 
> is producing data to keep working.
> 
> The problem that can arise here is that if the producer of the data, in this 
> case the code reading the response from the mod_wsgi daemon process, is 
> generating data quicker than the HTTP client is consuming it, then data could 
> keep getting buffered up and so grow memory. Because of this issue there is 
> some protection though and there is a threshold on the amount of data that 
> will be allowed to be buffered. When this threshold is reached, then the 
> writing of data back to the HTTP client will be forced to block until the 
> HTTP client reads more data.
> 
> So there is definitely protection on buffering up too much data in memory in 
> Apache 2.4. Right now my understanding is that this protection may not exist 
> in Apache 2.2.  Certainly the code in Apache 2.2 is quite different and there 
> aren't huge comments in it like in Apache 2.4 which explain how the buffering 
> works.
> 
> For Apache 2.2 then, what this may mean if there is indeed no limit on the 
> amount of buffering is that if you have a slow HTTP client, then some part of 
> the response coming back from a WSGI application in mod_wsgi daemon mode 
> might get buffered up in the memory of the Apache child worker process. This 
> will be allocated against the per worker thread memory pool. Thus memory can 
> blow out. Further, with there being no MaxMemFree default in Apache 2.2, even 
> when the request is done, that memory is retained against the per worker 
> thread memory pool and will not be released back to the process to be used by 
> other things.
> 
> There are two potential problems with Apache 2.2 therefore. Potentially no 
> limit on how much can be buffered in memory and then even when a request 
> completes, all that memory which was buffered gets retained by that worker 
> thread.
> 
> The moral of this story. Use Apache 2.4, and do not use Apache 2.2.
> 
> Since I have never gone and tested Apache 2.2 to validate my understanding on 
> this point, probably about time I went and did that. :-)

And after a bit playing around, this is definitely a problem in Apache 2.2.

I don't quite understand how its algorithm works because if I actually suspend 
the HTTP client process the Apache child worker process will actually block and 
not try and send more data.

The problem case is therefore specifically where the HTTP client is still 
actually reading some data, but not at a fast enough rate. My best guess based 
on what I am observing is that if no data at all can be written then it will 
block until it can write some more and so everything halts at that point. If 
maybe it can do a partial write, then it will allow the remainder to be 
buffered. So as long as the HTTP client is reading at a fast enough rate that 
it doesn't block entirely, then the excess coming though from the mod_wsgi 
daemon process can backlog and get buffered.

So it is quite nasty, but the only proper solution is simply to use Apache 2.4 
where the problem is addressed. The only workaround I can do at the mod_wsgi 
level is to periodically force a flush of data based on the volume being 
written. Probably something that I should finally do now that I am trying to be 
active in doing more work on it. :-) 

Graham

> On 28/10/2014, at 9:21 PM, Graham Dumpleton <[email protected]> 
> wrote:
> 
>> 
>> On 28/10/2014, at 5:18 PM, Kelvin Wong <[email protected]> wrote:
>> 
>>> Hi Graham,
>>> 
>>> Thanks a lot for your detailed explanations.
>>> 
>>> I used to reload the apache processes instead of restart them.
>>> So is there any relation to the "MaxConnectionsPerChild" setting that when 
>>> the process met the limit, it restart the child process?
>> 
>> There shouldn't be as MaxConnectionsPerChild only pertains to the Apache 
>> child worker processes and the number of connections. When a specific Apache 
>> child worker process is restarted, it is in a form of graceful restart, but 
>> the Apache configuration isn't being read by Apache as a whole so the 
>> 'generation' counter inside Apache wouldn't change and so the name of the 
>> proxy socket file wouldn't change either. So that option shouldn't cause 
>> those errors.
>> 
>>> If so, any alternative to this setting? I used this setting to bound the 
>>> memory usage of apache.
>> 
>> The issue is why you would be seeing memory growth in the Apache child 
>> worker processes to start with and how much. By rights they shouldn't keep 
>> increasing in memory usage. They can increase in memory a bit, but then 
>> should plateau. For example:
>> 
>> <PastedGraphic-1.tiff>
>> 
>> If using mod_wsgi daemon mode where the Apache child worker process are only 
>> proxying requests or serving static files, this growth up to a ceiling as 
>> reflected in 'Apache Child Process Memory (Average)' is generally the result 
>> of the per worker thread memory pools that Apache uses.
>> 
>> The problem is that there may not be a limit on the upper size of the per 
>> worker thread memory pools and that is that the size is unbounded. This is 
>> especially the case in Apache 2.2 as the compiled in default is unlimited, 
>> so if the configuration file doesn't set it, then the ceiling can grow to be 
>> quite high as it depends a lot on how much data may get buffered in memory 
>> due to slow HTTP clients.
>> 
>> In Apache 2.4 there is now at least a compiled in default, but it still may 
>> be higher than desirable. In Apache 2.4 that default is:
>> 
>> #define ALLOCATOR_MAX_FREE_DEFAULT (2048*1024)
>> 
>> This means that for each worker thread in a process, the memory pool 
>> associated with it can retain 2MB of memory. As you have 25 worker threads, 
>> that means these memory pools can consume up to 50MB per worker process. You 
>> then have up to 6 worker processes. So that is 300MB in worst case if the 
>> throughput was enough to keep all the process active and Apache didn't start 
>> killing them off as not needed. But then the number of processes will not go 
>> all the way back to 1 due to MaxSpareThreads being 75, thus it will always 
>> keep at least 3 processes around.
>> 
>> Anyway, if memory usage in Apache child worker process is a big issue, 
>> especially where you are actually delegating the WSGI application to run in 
>> mod_wsgi daemon mode, meaning that the Apache child worker processes should 
>> be able to be run quite lean, then you can adjust MaxMemFree down from the 
>> quite high default in Apache 2.4 (and non existent in Apache 2.2). 
>> 
>> There are two other changes you can also make related to memory usage in the 
>> Apache child worker processes.
>> 
>> The first is if you are always using mod_wsgi daemon mode and never 
>> requiring embedded mode, then turn off initialisation of the Python 
>> interpreter in the Apache child worker processes.
>> 
>> The second is that on Linux the default per thread stack size is 8MB. This 
>> much shouldn't usually be required and really only counts towards virtual 
>> memory usage, but some VPS systems count virtual memory for billing purposes 
>> so it can become a problem that way.
>> 
>> So rather than thinking that MaxConnectionsPerChild is the only solution, 
>> use directives to control how much memory may be getting used and/or 
>> retained by the worker threads.
>> 
>> In  mod_wsgi-express for example, the default generated configuration it 
>> generates as a saner default where mod_wsgi daemon mode is always used is:
>> 
>> # Turn off Python interpreter initialisation in Apache child worker process 
>> as not required
>> # if using mod_wsgi daemon mode exclusively. This will be overridden if 
>> enabled use of
>> # Python scripts for access control/authentication/authorisation which have 
>> to run in the
>> # Apache child worker processes.
>> 
>> WSGIRestrictEmbedded On
>> 
>> # Set a limit on the amount of memory which will be retained in per worker 
>> memory pools.
>> # More memory than this can still be used if need be, but when no longer 
>> required and above
>> # this limited it will be released back to the process level memory 
>> allocated for reuse rather
>> # that being retained for exclusive use by the thread, with a risk of a 
>> higher memory level.
>> 
>> MaxMemFree 64
>> 
>> # Reduce the notional per thread stack size for all the worker threads. This 
>> relates more to
>> # virtual memory usage, but some VPS systems can virtual memory for billing 
>> purposes.
>> 
>> ThreadStackSize 262144
>> 
>>> Upgrading to mod_wsgi 4.3.0 will solve this problem? mod_wsgi 4.3.0 
>>> improved handling on segmentation fault error?
>> 
>> As far as what was being discussed before, version 4.3.0 makes the error 
>> messages more descriptive of the issue and introduces new messages where 
>> they weren't able to be distinguished before. It wasn't possible to do this 
>> as well before as Apache code was being relied on to handle reading back 
>> data from mod_wsgi daemon processes. The mod_wsgi code now does this itself 
>> and can better control things. In mod_wsgi version 4.4.0 if can get the 
>> changes completed (more likely 4.5.0), there will be better messages again 
>> for some things as error codes that were previously being thrown away by 
>> Apache will actually be known.
>> 
>> So it will not change how a segmentation fault is handled as that can't be 
>> changed, just the wording of the error message when a mod_wsgi daemon 
>> process may have died or was shutdown.
>> 
>> I have thrown a lot of information at you here, but do you actually have 
>> figures on what the memory usage of the Apache child worker processes grows 
>> to? Are you using any Apache modules for implementing caching in Apache?
>> 
>> Graham 
>> 
>>> On Tuesday, October 28, 2014 1:30:06 PM UTC+8, Graham Dumpleton wrote:
>>> Would suggest upgrading to mod_wsgi 4.3.0 if you can as the error messages 
>>> when there are communication problems between Apache child worker process 
>>> and mod_wsgi daemon process have been improved.
>>> 
>>> More comments below.
>>> 
>>> On 28 October 2014 15:43, Kelvin Wong <[email protected]> wrote:
>>> Hi Graham and everyone else
>>> 
>>> I'm running multiple site on Django 1.6.7, Apache/2.4.7 (Ubuntu 14.04), 
>>> OpenSSL/1.0.1f, mod_wsgi/4.2.5, Python/2.7.6, Server MPM: worker.
>>> I found that the server start returning 504 and then 503, and the following 
>>> error shown up.
>>> I researched some issues related with it, even added "WSGISocketPrefix 
>>> /var/run/apache2/wsgi", but the issue still occured.
>>> I have no idea why it happened. Can anyone give some directions on this 
>>> issue?
>>> 
>>> Thanks!
>>> 
>>> apache error log
>>> [Sun Oct 26 07:34:34.732934 2014] [wsgi:error] [pid 29268:tid 
>>> 140053011478272] [client xx.xxx.xxx.xxx:xxxxx] Timeout when reading 
>>> response headers from daemon process 'site-1': 
>>> /home/ubuntu/site-1/apache/wsgi.py
>>> [Sun Oct 26 07:34:37.198806 2014] [wsgi:error] [pid 27816:tid 
>>> 140052910765824] (11)Resource temporarily unavailable: [client 
>>> xx.xx.xx.xx:xxxxx] mod_wsgi (pid=27816): Unable to connect to WSGI daemon 
>>> process 'site-1' on '/var/run/apache2/wsgi.17227.2.3.sock'.
>>> 
>>> This one can occur when the mod_wsgi daemon process crashes. There should 
>>> be a segmentation fault error message or similar in the main Apache error 
>>> log (not VirtualHost specific log).
>>> 
>>> It can also occur if there are incomplete requests still running when a 
>>> mod_wsgi daemon process is shutdown on being restarted due to the WSGI 
>>> script file being touched or if Apache was restarted. In the latter case, 
>>> the mod_wsgi daemon process would have had to have been killed off by 
>>> Apache before the Apache child worker process which was proxying it to had. 
>>> This can especially be the case if an Apache graceful restart was being 
>>> done. 
>>>  
>>> occasionally
>>> [Tue Oct 28 02:20:40.722140 2014] [wsgi:error] [pid 24158:tid 
>>> 140182690981632] (2)No such file or directory: [client 
>>> 24.171.250.159:60769] mod_wsgi (pid=24158): Unable to connect to WSGI 
>>> daemon process 'snaptee-production-api-ssl' on 
>>> '/var/run/apache2/wsgi.30188.7.3.sock'.
>>> 
>>> This can also be due to Apache graceful restart being done and there were 
>>> keep alive connections being handled from a HTTP client. In an Apache 
>>> graceful restart, because of Apache handles the mod_wsgi daemon processes, 
>>> they don't have a graceful shutdown in the same way as Apache child worker 
>>> processes.
>>> 
>>> So what happens is the following:
>>> 
>>> 1. Apache graceful restart is triggered.
>>> 2. Apache parent process sends SIGUSR1 to Apache child worker process to 
>>> signal graceful shutdown.
>>> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to 
>>> signal shutdown.
>>> 4. The mod_wsgi daemon processes complete their requests and restart. In 
>>> the next incarnation of the mod_wsgi daemon processes after an Apache 
>>> restart they expect a different path for the proxy socket, with the number 
>>> at the end increasing based on Apache generation number.
>>> 5. The Apache child worker process because it was in a graceful restart 
>>> mode, operates on the understanding that it can keep handling any requests 
>>> on a keep alive socket connection from a HTTP client until there are no 
>>> more. It therefore takes next request on same connection and tries to 
>>> connect to mod_wsgi daemon process, but using the proxy socket name as was 
>>> used before, but that name has changed for the next Apache configuration 
>>> generation and no longer exists, thus it fails.
>>> 
>>> The name of the proxy socket changes across Apache restarts because 
>>> otherwise you could have Apache child worker processes under an old 
>>> configuration sending requests to a mod_wsgi daemon process using the new 
>>> configuration, which could cause problems including security issues. There 
>>> are therefore specific protections in place to ensure that only Apache 
>>> child worker processes and mod_wsgi daemon mode processes created against 
>>> the same Apache configuration generation can talk to each other.
>>>  
>>> wsgi config for that site
>>> WSGIDaemonProcess site-1 display-name=site-1 user=www-data threads=25 
>>> python-path=/home/ubuntu/site-1/django:/home/ubuntu/.virtualenvs/site-1/lib/python2.7/site-packages
>>> WSGIProcessGroup site-1
>>> WSGIApplicationGroup %{GLOBAL}
>>> WSGIScriptAlias / /home/ubuntu/site-1/apache/wsgi.py
>>> 
>>> worker.conf
>>> <IfModule mpm_worker_module>
>>>        StartServers                 2
>>>        MinSpareThreads             25
>>>        MaxSpareThreads             75
>>>        ThreadLimit                 64
>>>        ThreadsPerChild             25
>>>        MaxRequestWorkers          150
>>>        MaxConnectionsPerChild    1000
>>> </IfModule>
>>> 
>>> So my best guess is that you are doing Apache graceful restarts when these 
>>> are occurring.
>>> 
>>> Are you using Apache graceful restarts as suspected?
>>> 
>>> Graham 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] mod_wsgi returning 503 service unavailable

Reply via email to