Here is a part of main apache error log when the 503 happened. [Wed Oct 29 12:56:26.727197 2014] [mpm_worker:error] [pid 1322:tid 139958218430336] AH00287: server is within MinSpareThreads of MaxRequestWorkers, consider raising the MaxRequestWorkers setting [Wed Oct 29 12:56:30.730902 2014] [mpm_worker:error] [pid 1322:tid 139958218430336] AH00286: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
What does it means? It can be solved by raising "MaxRequestWorkers"? Thanks. On Thursday, October 30, 2014 10:12:19 AM UTC+8, Kelvin Wong wrote: > > Hi Graham > > I upgraded mod_wsgi to 4.3.0. The 503 situation happened again. > The error message is same as before. But this time, there is no more " > Timeout when reading response headers from daemon process" before the "(11 > )Resource temporarily unavailable". > Is there anything I can do to prevent this kind of situation? > Or there is any ways to make the apache self-heal? > > Thanks. > > > On Wednesday, October 29, 2014 10:36:13 AM UTC+8, Graham Dumpleton wrote: >> >> >> On 28/10/2014, at 10:58 PM, Kelvin Wong <[email protected]> wrote: >> >> do you actually have figures on what the memory usage of the Apache child >> worker processes grows to? >> >> I do. I used New Relic to monitor the system resource usage. I found as >> time goes, the apache processes take a lot of memory. That's why I want to >> control the memory usage of the apache. >> >> >> Okay, but where in New Relic are you monitoring this? I am concerned now >> as to whether you are even looking at just the Apache child worker >> processes that MaxConnectionsPerChild pertains to. >> >> If you were looking at the host breakout chart on the overview dashboard >> for the WSGI application being monitoring by the Python web application >> agent, and you are using daemon mode, then what you are looking at is the >> memory taken by the mod_wsgi daemon processes and not the Apache child >> worker processes. As a consequence the MaxConnectionsPerChild directive >> doesn't apply. >> >> If you were looking at the server monitoring charts and looking at the >> Apache httpd/apache2 process, then that is all processes under Apache, >> which counts both the Apache child worker processes and the mod_wsgi daemon >> processes. If you relied on those charts, you can't tell whether it is the >> Apache child processes or mod_wsgi daemon processes. >> >> So you can from the Python web application agent or the server monitoring >> agent tell how much memory is just being used by the Apache child worker >> processes. >> >> In the chart I included which can still see below, that is relying on a >> platform plugin agent for Apache/mod_wsgi. Unlike the others, it does pull >> out memory just for the Apache child worker processes. I then created a >> custom dashboard which includes charts for metrics from both the Python web >> application agent and the Apache/mod_wsgi platform plugin so can cross >> compare them. That is how I got all the charts I showed. >> >> So right now I am question whether you should be >> using MaxConnectionsPerChild as it is more likely that you may be looking >> at the size of the mod_wsgi daemon processes which actually contain your >> WSGI application. >> >> Also, my application is mainly apis for mobile application which involved >> uploading files/images. I found that there are a lot of IOError occurred as >> seemed the upload is unexpected terminated by the mobile application. >> Do you have any suggestions on this? >> >> >> You can't stop connections being dropped, especially with mobile agents. >> >> What size are the images? >> >> One thing you can do and which is actually a good idea overall >> independent of your specific situation, is to place a nginx front end proxy >> in front of Apache/mod_wsgi. The preferable way of configuring nginx is to >> have it use HTTP/1.1 and keep alive connections to Apache. You have to be >> on top of understanding your configuration though. If you aren't you are >> better off using default of HTTP/1.0 for the proxy connections from nginx >> to Apache. >> >> Either way, the reason nginx helps is that when doing proxying, nginx can >> pre buffer up to a certain amount of request content and will only bother >> proxying a request to Apache, if request content is below the limit, when >> it has successfully received it all. Thus Apache will not get troubled by >> any requests which got dropped and so the IOError issue can be cut >> dramatically. >> >> In short, nginx helps to isolate Apache from slow HTTP clients and can >> make Apache perform better with less resources. >> >> And will these kind of requests keep in memory forever as it handled >> incorrectly and make the memory usage grow? >> >> >> No they aren't held indefinitely. >> >> The problem with slow HTTP clients is when although no data is coming >> through, it still holds the connection open until a timeout occurs based on >> Timeout directive and then connection is dropped. >> >> What are you using for the Timeout directive? >> >> The compiled in default for Timeout is 60 seconds, but the sample >> configuration files often have 300 seconds. 300 seconds it way too high and >> for many situations 60 seconds is also too much, but you have to be a bit >> careful about dropping it too low. >> >> This again though is where nginx as a front end proxy helps, because the >> request would simply never get through to Apache if the content wasn't >> coming through and expected content was under the limit. >> >> Yes nginx still has to deal with the hung connection, but it is much more >> efficient at that than Apache as nginx uses an async event driven system to >> manage many connections in one thread where as Apache uses a thread per >> connection. >> >> So what happens is the following: >>> >>> 1. Apache graceful restart is triggered. >>> 2. Apache parent process sends SIGUSR1 to Apache child worker process to >>> signal graceful shutdown. >>> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to >>> signal shutdown. >>> 4. The mod_wsgi daemon processes complete their requests and restart. In >>> the next incarnation of the mod_wsgi daemon processes after an Apache >>> restart they expect a different path for the proxy socket, with the number >>> at the end increasing based on Apache generation number. >>> 5. The Apache child worker process because it was in a graceful restart >>> mode, operates on the understanding that it can keep handling any requests >>> on a keep alive socket connection from a HTTP client until there are no >>> more. It therefore takes next request on same connection and tries to >>> connect to mod_wsgi daemon process, but using the proxy socket name as was >>> used before, but that name has changed for the next Apache configuration >>> generation and no longer exists, thus it fails. >>> >> >> Is there any ways to avoid Apache graceful restart? Apache graceful >> restart is triggered by the "MaxConnectionsPerChild" or other settings? >> If so, is it better to control by "maximum-requests" in mod_wsgi setting? >> >> >> The maximum-requests pertains for mod_wsgi daemon mode processes. >> >> This is where you have to correctly identify which processes are actually >> growing, the Apache child worker processes of the mod_wsgi daemon processes >> that your WSGI application is running in. >> >> If it is the one your WSGI application is in, there are many things you >> could be doing wrong which would cause memory usage to keep growing. >> >> You might even be encountering bugs in third party packages you use. >> Django for example until at least 1.6.? has had an issue with its signal >> mechanism that could result in deadlocks when garbage collection is being >> done. This could lock up a request thread, but then also cause the garbage >> collector to not run again. The result being that memory usage could keep >> growing and growing since as the garbage collector will never be able to >> reclaim objects. >> >> So the big question still is, which processes are the ones growing in >> memory usage? Only then can say what you really need to do and give >> suggestions on how to track it down. >> >> For reference, the Apache/mod_wsgi platform plugin is detailed at: >> >> https://pypi.python.org/pypi/mod_wsgi-metrics/1.1.0 >> >> The Django GC dead lock issue as it came to our intention, with links to >> Django bug reports, can be found at: >> >> >> https://discuss.newrelic.com/t/background-thread-slowly-leaks-memory/2170 >> >> Graham >> >> Are you using any Apache modules for implementing caching in Apache? >> No. I just have application-level caching (ORM caching and memcache for >> some of the requests). >> >> >> On Tuesday, October 28, 2014 6:21:22 PM UTC+8, Graham Dumpleton wrote: >>> >>> >>> On 28/10/2014, at 5:18 PM, Kelvin Wong <[email protected]> wrote: >>> >>> Hi Graham, >>> >>> Thanks a lot for your detailed explanations. >>> >>> I used to reload the apache processes instead of restart them. >>> So is there any relation to the "MaxConnectionsPerChild" setting that >>> when the process met the limit, it restart the child process? >>> >>> >>> There shouldn't be as MaxConnectionsPerChild only pertains to the Apache >>> child worker processes and the number of connections. When a specific >>> Apache child worker process is restarted, it is in a form of graceful >>> restart, but the Apache configuration isn't being read by Apache as a whole >>> so the 'generation' counter inside Apache wouldn't change and so the name >>> of the proxy socket file wouldn't change either. So that option shouldn't >>> cause those errors. >>> >>> If so, any alternative to this setting? I used this setting to bound >>> the memory usage of apache. >>> >>> >>> The issue is why you would be seeing memory growth in the Apache child >>> worker processes to start with and how much. By rights they shouldn't keep >>> increasing in memory usage. They can increase in memory a bit, but then >>> should plateau. For example: >>> >>> >>> If using mod_wsgi daemon mode where the Apache child worker process are >>> only proxying requests or serving static files, this growth up to a ceiling >>> as reflected in 'Apache Child Process Memory (Average)' is generally the >>> result of the per worker thread memory pools that Apache uses. >>> >>> The problem is that there may not be a limit on the upper size of the >>> per worker thread memory pools and that is that the size is unbounded. This >>> is especially the case in Apache 2.2 as the compiled in default is >>> unlimited, so if the configuration file doesn't set it, then the ceiling >>> can grow to be quite high as it depends a lot on how much data may get >>> buffered in memory due to slow HTTP clients. >>> >>> In Apache 2.4 there is now at least a compiled in default, but it still >>> may be higher than desirable. In Apache 2.4 that default is: >>> >>> #define ALLOCATOR_MAX_FREE_DEFAULT (2048*1024) >>> >>> This means that for each worker thread in a process, the memory pool >>> associated with it can retain 2MB of memory. As you have 25 worker threads, >>> that means these memory pools can consume up to 50MB per worker process. >>> You then have up to 6 worker processes. So that is 300MB in worst case if >>> the throughput was enough to keep all the process active and Apache didn't >>> start killing them off as not needed. But then the number of processes will >>> not go all the way back to 1 due to MaxSpareThreads being 75, thus it will >>> always keep at least 3 processes around. >>> >>> Anyway, if memory usage in Apache child worker process is a big issue, >>> especially where you are actually delegating the WSGI application to run in >>> mod_wsgi daemon mode, meaning that the Apache child worker processes should >>> be able to be run quite lean, then you can adjust MaxMemFree down from the >>> quite high default in Apache 2.4 (and non existent in Apache 2.2). >>> >>> There are two other changes you can also make related to memory usage in >>> the Apache child worker processes. >>> >>> The first is if you are always using mod_wsgi daemon mode and never >>> requiring embedded mode, then turn off initialisation of the Python >>> interpreter in the Apache child worker processes. >>> >>> The second is that on Linux the default per thread stack size is 8MB. >>> This much shouldn't usually be required and really only counts towards >>> virtual memory usage, but some VPS systems count virtual memory for billing >>> purposes so it can become a problem that way. >>> >>> So rather than thinking that MaxConnectionsPerChild is the only >>> solution, use directives to control how much memory may be getting used >>> and/or retained by the worker threads. >>> >>> In mod_wsgi-express for example, the default generated configuration it >>> generates as a saner default where mod_wsgi daemon mode is always used is: >>> >>> # Turn off Python interpreter initialisation in Apache child worker >>> process as not required >>> # if using mod_wsgi daemon mode exclusively. This will be overridden if >>> enabled use of >>> # Python scripts for access control/authentication/authorisation which >>> have to run in the >>> # Apache child worker processes. >>> >>> WSGIRestrictEmbedded On >>> >>> # Set a limit on the amount of memory which will be retained in per >>> worker memory pools. >>> # More memory than this can still be used if need be, but when no longer >>> required and above >>> # this limited it will be released back to the process level memory >>> allocated for reuse rather >>> # that being retained for exclusive use by the thread, with a risk of a >>> higher memory level. >>> >>> MaxMemFree 64 >>> >>> # Reduce the notional per thread stack size for all the worker threads. >>> This relates more to >>> # virtual memory usage, but some VPS systems can virtual memory for >>> billing purposes. >>> >>> ThreadStackSize 262144 >>> >>> Upgrading to mod_wsgi 4.3.0 will solve this problem? mod_wsgi 4.3.0 >>> improved handling on segmentation fault error? >>> >>> >>> As far as what was being discussed before, version 4.3.0 makes the error >>> messages more descriptive of the issue and introduces new messages where >>> they weren't able to be distinguished before. It wasn't possible to do this >>> as well before as Apache code was being relied on to handle reading back >>> data from mod_wsgi daemon processes. The mod_wsgi code now does this itself >>> and can better control things. In mod_wsgi version 4.4.0 if can get the >>> changes completed (more likely 4.5.0), there will be better messages again >>> for some things as error codes that were previously being thrown away by >>> Apache will actually be known. >>> >>> So it will not change how a segmentation fault is handled as that can't >>> be changed, just the wording of the error message when a mod_wsgi daemon >>> process may have died or was shutdown. >>> >>> I have thrown a lot of information at you here, but do you actually have >>> figures on what the memory usage of the Apache child worker processes grows >>> to? Are you using any Apache modules for implementing caching in Apache? >>> >>> Graham >>> >>> On Tuesday, October 28, 2014 1:30:06 PM UTC+8, Graham Dumpleton wrote: >>>> >>>> Would suggest upgrading to mod_wsgi 4.3.0 if you can as the error >>>> messages when there are communication problems between Apache child worker >>>> process and mod_wsgi daemon process have been improved. >>>> >>>> More comments below. >>>> >>>> On 28 October 2014 15:43, Kelvin Wong <[email protected]> wrote: >>>> >>>>> Hi Graham and everyone else >>>>> >>>>> I'm running multiple site on Django 1.6.7, Apache/2.4.7 (Ubuntu >>>>> 14.04), OpenSSL/1.0.1f, mod_wsgi/4.2.5, Python/2.7.6, Server MPM: worker. >>>>> I found that the server start returning 504 and then 503, and the >>>>> following error shown up. >>>>> I researched some issues related with it, even added "WSGISocketPrefix >>>>> /var/run/apache2/wsgi", but the issue still occured. >>>>> I have no idea why it happened. Can anyone give some directions on >>>>> this issue? >>>>> >>>>> Thanks! >>>>> >>>>> apache error log >>>>> [Sun Oct 26 07:34:34.732934 2014] [wsgi:error] [pid 29268:tid >>>>> 140053011478272] [client xx.xxx.xxx.xxx:xxxxx] Timeout when reading >>>>> response headers from daemon process 'site-1': /home/ubuntu/site-1/ >>>>> apache/wsgi.py >>>>> [Sun Oct 26 07:34:37.198806 2014] [wsgi:error] [pid 27816:tid >>>>> 140052910765824] (11)Resource temporarily unavailable: [client xx.xx. >>>>> xx.xx:xxxxx] mod_wsgi (pid=27816): Unable to connect to WSGI daemon >>>>> process 'site-1' on '/var/run/apache2/wsgi.17227.2.3.sock'. >>>>> >>>> >>>> This one can occur when the mod_wsgi daemon process crashes. There >>>> should be a segmentation fault error message or similar in the main Apache >>>> error log (not VirtualHost specific log). >>>> >>>> It can also occur if there are incomplete requests still running when a >>>> mod_wsgi daemon process is shutdown on being restarted due to the WSGI >>>> script file being touched or if Apache was restarted. In the latter case, >>>> the mod_wsgi daemon process would have had to have been killed off by >>>> Apache before the Apache child worker process which was proxying it to >>>> had. >>>> This can especially be the case if an Apache graceful restart was being >>>> done. >>>> >>>> >>>>> occasionally >>>>> [Tue Oct 28 02:20:40.722140 2014] [wsgi:error] [pid 24158:tid >>>>> 140182690981632] (2)No such file or directory: [client 24.171.250.159: >>>>> 60769] mod_wsgi (pid=24158): Unable to connect to WSGI daemon process >>>>> 'snaptee-production-api-ssl' on '/var/run/apache2/wsgi.30188.7.3.sock' >>>>> . >>>>> >>>> >>>> This can also be due to Apache graceful restart being done and there >>>> were keep alive connections being handled from a HTTP client. In an Apache >>>> graceful restart, because of Apache handles the mod_wsgi daemon processes, >>>> they don't have a graceful shutdown in the same way as Apache child worker >>>> processes. >>>> >>>> So what happens is the following: >>>> >>>> 1. Apache graceful restart is triggered. >>>> 2. Apache parent process sends SIGUSR1 to Apache child worker process >>>> to signal graceful shutdown. >>>> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to >>>> signal shutdown. >>>> 4. The mod_wsgi daemon processes complete their requests and restart. >>>> In the next incarnation of the mod_wsgi daemon processes after an Apache >>>> restart they expect a different path for the proxy socket, with the number >>>> at the end increasing based on Apache generation number. >>>> 5. The Apache child worker process because it was in a graceful restart >>>> mode, operates on the understanding that it can keep handling any requests >>>> on a keep alive socket connection from a HTTP client until there are no >>>> more. It therefore takes next request on same connection and tries to >>>> connect to mod_wsgi daemon process, but using the proxy socket name as was >>>> used before, but that name has changed for the next Apache configuration >>>> generation and no longer exists, thus it fails. >>>> >>>> The name of the proxy socket changes across Apache restarts because >>>> otherwise you could have Apache child worker processes under an old >>>> configuration sending requests to a mod_wsgi daemon process using the new >>>> configuration, which could cause problems including security issues. There >>>> are therefore specific protections in place to ensure that only Apache >>>> child worker processes and mod_wsgi daemon mode processes created against >>>> the same Apache configuration generation can talk to each other. >>>> >>>> >>>>> wsgi config for that site >>>>> WSGIDaemonProcess site-1 display-name=site-1 user=www-data threads=25 >>>>> python-path=/home/ubuntu/site-1/django:/home/ubuntu/.virtualenvs/site- >>>>> 1/lib/python2.7/site-packages >>>>> WSGIProcessGroup site-1 >>>>> WSGIApplicationGroup %{GLOBAL} >>>>> WSGIScriptAlias / /home/ubuntu/site-1/apache/wsgi.py >>>>> >>>>> worker.conf >>>>> <IfModule mpm_worker_module> >>>>> StartServers 2 >>>>> MinSpareThreads 25 >>>>> MaxSpareThreads 75 >>>>> ThreadLimit 64 >>>>> ThreadsPerChild 25 >>>>> MaxRequestWorkers 150 >>>>> MaxConnectionsPerChild 1000 >>>>> </IfModule> >>>>> >>>> >>>> So my best guess is that you are doing Apache graceful restarts when >>>> these are occurring. >>>> >>>> Are you using Apache graceful restarts as suspected? >>>> >>>> Graham >>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "modwsgi" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/modwsgi. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/modwsgi. >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/modwsgi. For more options, visit https://groups.google.com/d/optout.
