Hi Graham I upgraded mod_wsgi to 4.3.0. The 503 situation happened again. The error message is same as before. But this time, there is no more " Timeout when reading response headers from daemon process" before the "(11) Resource temporarily unavailable". Is there anything I can do to prevent this kind of situation? Or there is any ways to make the apache self-heal?
Thanks. On Wednesday, October 29, 2014 10:36:13 AM UTC+8, Graham Dumpleton wrote: > > > On 28/10/2014, at 10:58 PM, Kelvin Wong <[email protected] <javascript:>> > wrote: > > do you actually have figures on what the memory usage of the Apache child > worker processes grows to? > > I do. I used New Relic to monitor the system resource usage. I found as > time goes, the apache processes take a lot of memory. That's why I want to > control the memory usage of the apache. > > > Okay, but where in New Relic are you monitoring this? I am concerned now > as to whether you are even looking at just the Apache child worker > processes that MaxConnectionsPerChild pertains to. > > If you were looking at the host breakout chart on the overview dashboard > for the WSGI application being monitoring by the Python web application > agent, and you are using daemon mode, then what you are looking at is the > memory taken by the mod_wsgi daemon processes and not the Apache child > worker processes. As a consequence the MaxConnectionsPerChild directive > doesn't apply. > > If you were looking at the server monitoring charts and looking at the > Apache httpd/apache2 process, then that is all processes under Apache, > which counts both the Apache child worker processes and the mod_wsgi daemon > processes. If you relied on those charts, you can't tell whether it is the > Apache child processes or mod_wsgi daemon processes. > > So you can from the Python web application agent or the server monitoring > agent tell how much memory is just being used by the Apache child worker > processes. > > In the chart I included which can still see below, that is relying on a > platform plugin agent for Apache/mod_wsgi. Unlike the others, it does pull > out memory just for the Apache child worker processes. I then created a > custom dashboard which includes charts for metrics from both the Python web > application agent and the Apache/mod_wsgi platform plugin so can cross > compare them. That is how I got all the charts I showed. > > So right now I am question whether you should be > using MaxConnectionsPerChild as it is more likely that you may be looking > at the size of the mod_wsgi daemon processes which actually contain your > WSGI application. > > Also, my application is mainly apis for mobile application which involved > uploading files/images. I found that there are a lot of IOError occurred as > seemed the upload is unexpected terminated by the mobile application. > Do you have any suggestions on this? > > > You can't stop connections being dropped, especially with mobile agents. > > What size are the images? > > One thing you can do and which is actually a good idea overall independent > of your specific situation, is to place a nginx front end proxy in front of > Apache/mod_wsgi. The preferable way of configuring nginx is to have it use > HTTP/1.1 and keep alive connections to Apache. You have to be on top of > understanding your configuration though. If you aren't you are better off > using default of HTTP/1.0 for the proxy connections from nginx to Apache. > > Either way, the reason nginx helps is that when doing proxying, nginx can > pre buffer up to a certain amount of request content and will only bother > proxying a request to Apache, if request content is below the limit, when > it has successfully received it all. Thus Apache will not get troubled by > any requests which got dropped and so the IOError issue can be cut > dramatically. > > In short, nginx helps to isolate Apache from slow HTTP clients and can > make Apache perform better with less resources. > > And will these kind of requests keep in memory forever as it handled > incorrectly and make the memory usage grow? > > > No they aren't held indefinitely. > > The problem with slow HTTP clients is when although no data is coming > through, it still holds the connection open until a timeout occurs based on > Timeout directive and then connection is dropped. > > What are you using for the Timeout directive? > > The compiled in default for Timeout is 60 seconds, but the sample > configuration files often have 300 seconds. 300 seconds it way too high and > for many situations 60 seconds is also too much, but you have to be a bit > careful about dropping it too low. > > This again though is where nginx as a front end proxy helps, because the > request would simply never get through to Apache if the content wasn't > coming through and expected content was under the limit. > > Yes nginx still has to deal with the hung connection, but it is much more > efficient at that than Apache as nginx uses an async event driven system to > manage many connections in one thread where as Apache uses a thread per > connection. > > So what happens is the following: >> >> 1. Apache graceful restart is triggered. >> 2. Apache parent process sends SIGUSR1 to Apache child worker process to >> signal graceful shutdown. >> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to >> signal shutdown. >> 4. The mod_wsgi daemon processes complete their requests and restart. In >> the next incarnation of the mod_wsgi daemon processes after an Apache >> restart they expect a different path for the proxy socket, with the number >> at the end increasing based on Apache generation number. >> 5. The Apache child worker process because it was in a graceful restart >> mode, operates on the understanding that it can keep handling any requests >> on a keep alive socket connection from a HTTP client until there are no >> more. It therefore takes next request on same connection and tries to >> connect to mod_wsgi daemon process, but using the proxy socket name as was >> used before, but that name has changed for the next Apache configuration >> generation and no longer exists, thus it fails. >> > > Is there any ways to avoid Apache graceful restart? Apache graceful > restart is triggered by the "MaxConnectionsPerChild" or other settings? > If so, is it better to control by "maximum-requests" in mod_wsgi setting? > > > The maximum-requests pertains for mod_wsgi daemon mode processes. > > This is where you have to correctly identify which processes are actually > growing, the Apache child worker processes of the mod_wsgi daemon processes > that your WSGI application is running in. > > If it is the one your WSGI application is in, there are many things you > could be doing wrong which would cause memory usage to keep growing. > > You might even be encountering bugs in third party packages you use. > Django for example until at least 1.6.? has had an issue with its signal > mechanism that could result in deadlocks when garbage collection is being > done. This could lock up a request thread, but then also cause the garbage > collector to not run again. The result being that memory usage could keep > growing and growing since as the garbage collector will never be able to > reclaim objects. > > So the big question still is, which processes are the ones growing in > memory usage? Only then can say what you really need to do and give > suggestions on how to track it down. > > For reference, the Apache/mod_wsgi platform plugin is detailed at: > > https://pypi.python.org/pypi/mod_wsgi-metrics/1.1.0 > > The Django GC dead lock issue as it came to our intention, with links to > Django bug reports, can be found at: > > > https://discuss.newrelic.com/t/background-thread-slowly-leaks-memory/2170 > > Graham > > Are you using any Apache modules for implementing caching in Apache? > No. I just have application-level caching (ORM caching and memcache for > some of the requests). > > > On Tuesday, October 28, 2014 6:21:22 PM UTC+8, Graham Dumpleton wrote: >> >> >> On 28/10/2014, at 5:18 PM, Kelvin Wong <[email protected]> wrote: >> >> Hi Graham, >> >> Thanks a lot for your detailed explanations. >> >> I used to reload the apache processes instead of restart them. >> So is there any relation to the "MaxConnectionsPerChild" setting that >> when the process met the limit, it restart the child process? >> >> >> There shouldn't be as MaxConnectionsPerChild only pertains to the Apache >> child worker processes and the number of connections. When a specific >> Apache child worker process is restarted, it is in a form of graceful >> restart, but the Apache configuration isn't being read by Apache as a whole >> so the 'generation' counter inside Apache wouldn't change and so the name >> of the proxy socket file wouldn't change either. So that option shouldn't >> cause those errors. >> >> If so, any alternative to this setting? I used this setting to bound the >> memory usage of apache. >> >> >> The issue is why you would be seeing memory growth in the Apache child >> worker processes to start with and how much. By rights they shouldn't keep >> increasing in memory usage. They can increase in memory a bit, but then >> should plateau. For example: >> >> >> If using mod_wsgi daemon mode where the Apache child worker process are >> only proxying requests or serving static files, this growth up to a ceiling >> as reflected in 'Apache Child Process Memory (Average)' is generally the >> result of the per worker thread memory pools that Apache uses. >> >> The problem is that there may not be a limit on the upper size of the per >> worker thread memory pools and that is that the size is unbounded. This is >> especially the case in Apache 2.2 as the compiled in default is unlimited, >> so if the configuration file doesn't set it, then the ceiling can grow to >> be quite high as it depends a lot on how much data may get buffered in >> memory due to slow HTTP clients. >> >> In Apache 2.4 there is now at least a compiled in default, but it still >> may be higher than desirable. In Apache 2.4 that default is: >> >> #define ALLOCATOR_MAX_FREE_DEFAULT (2048*1024) >> >> This means that for each worker thread in a process, the memory pool >> associated with it can retain 2MB of memory. As you have 25 worker threads, >> that means these memory pools can consume up to 50MB per worker process. >> You then have up to 6 worker processes. So that is 300MB in worst case if >> the throughput was enough to keep all the process active and Apache didn't >> start killing them off as not needed. But then the number of processes will >> not go all the way back to 1 due to MaxSpareThreads being 75, thus it will >> always keep at least 3 processes around. >> >> Anyway, if memory usage in Apache child worker process is a big issue, >> especially where you are actually delegating the WSGI application to run in >> mod_wsgi daemon mode, meaning that the Apache child worker processes should >> be able to be run quite lean, then you can adjust MaxMemFree down from the >> quite high default in Apache 2.4 (and non existent in Apache 2.2). >> >> There are two other changes you can also make related to memory usage in >> the Apache child worker processes. >> >> The first is if you are always using mod_wsgi daemon mode and never >> requiring embedded mode, then turn off initialisation of the Python >> interpreter in the Apache child worker processes. >> >> The second is that on Linux the default per thread stack size is 8MB. >> This much shouldn't usually be required and really only counts towards >> virtual memory usage, but some VPS systems count virtual memory for billing >> purposes so it can become a problem that way. >> >> So rather than thinking that MaxConnectionsPerChild is the only solution, >> use directives to control how much memory may be getting used and/or >> retained by the worker threads. >> >> In mod_wsgi-express for example, the default generated configuration it >> generates as a saner default where mod_wsgi daemon mode is always used is: >> >> # Turn off Python interpreter initialisation in Apache child worker >> process as not required >> # if using mod_wsgi daemon mode exclusively. This will be overridden if >> enabled use of >> # Python scripts for access control/authentication/authorisation which >> have to run in the >> # Apache child worker processes. >> >> WSGIRestrictEmbedded On >> >> # Set a limit on the amount of memory which will be retained in per >> worker memory pools. >> # More memory than this can still be used if need be, but when no longer >> required and above >> # this limited it will be released back to the process level memory >> allocated for reuse rather >> # that being retained for exclusive use by the thread, with a risk of a >> higher memory level. >> >> MaxMemFree 64 >> >> # Reduce the notional per thread stack size for all the worker threads. >> This relates more to >> # virtual memory usage, but some VPS systems can virtual memory for >> billing purposes. >> >> ThreadStackSize 262144 >> >> Upgrading to mod_wsgi 4.3.0 will solve this problem? mod_wsgi 4.3.0 >> improved handling on segmentation fault error? >> >> >> As far as what was being discussed before, version 4.3.0 makes the error >> messages more descriptive of the issue and introduces new messages where >> they weren't able to be distinguished before. It wasn't possible to do this >> as well before as Apache code was being relied on to handle reading back >> data from mod_wsgi daemon processes. The mod_wsgi code now does this itself >> and can better control things. In mod_wsgi version 4.4.0 if can get the >> changes completed (more likely 4.5.0), there will be better messages again >> for some things as error codes that were previously being thrown away by >> Apache will actually be known. >> >> So it will not change how a segmentation fault is handled as that can't >> be changed, just the wording of the error message when a mod_wsgi daemon >> process may have died or was shutdown. >> >> I have thrown a lot of information at you here, but do you actually have >> figures on what the memory usage of the Apache child worker processes grows >> to? Are you using any Apache modules for implementing caching in Apache? >> >> Graham >> >> On Tuesday, October 28, 2014 1:30:06 PM UTC+8, Graham Dumpleton wrote: >>> >>> Would suggest upgrading to mod_wsgi 4.3.0 if you can as the error >>> messages when there are communication problems between Apache child worker >>> process and mod_wsgi daemon process have been improved. >>> >>> More comments below. >>> >>> On 28 October 2014 15:43, Kelvin Wong <[email protected]> wrote: >>> >>>> Hi Graham and everyone else >>>> >>>> I'm running multiple site on Django 1.6.7, Apache/2.4.7 (Ubuntu 14.04), >>>> OpenSSL/1.0.1f, mod_wsgi/4.2.5, Python/2.7.6, Server MPM: worker. >>>> I found that the server start returning 504 and then 503, and the >>>> following error shown up. >>>> I researched some issues related with it, even added "WSGISocketPrefix >>>> /var/run/apache2/wsgi", but the issue still occured. >>>> I have no idea why it happened. Can anyone give some directions on this >>>> issue? >>>> >>>> Thanks! >>>> >>>> apache error log >>>> [Sun Oct 26 07:34:34.732934 2014] [wsgi:error] [pid 29268:tid >>>> 140053011478272] [client xx.xxx.xxx.xxx:xxxxx] Timeout when reading >>>> response headers from daemon process 'site-1': /home/ubuntu/site-1/ >>>> apache/wsgi.py >>>> [Sun Oct 26 07:34:37.198806 2014] [wsgi:error] [pid 27816:tid >>>> 140052910765824] (11)Resource temporarily unavailable: [client xx.xx.xx >>>> .xx:xxxxx] mod_wsgi (pid=27816): Unable to connect to WSGI daemon >>>> process 'site-1' on '/var/run/apache2/wsgi.17227.2.3.sock'. >>>> >>> >>> This one can occur when the mod_wsgi daemon process crashes. There >>> should be a segmentation fault error message or similar in the main Apache >>> error log (not VirtualHost specific log). >>> >>> It can also occur if there are incomplete requests still running when a >>> mod_wsgi daemon process is shutdown on being restarted due to the WSGI >>> script file being touched or if Apache was restarted. In the latter case, >>> the mod_wsgi daemon process would have had to have been killed off by >>> Apache before the Apache child worker process which was proxying it to had. >>> This can especially be the case if an Apache graceful restart was being >>> done. >>> >>> >>>> occasionally >>>> [Tue Oct 28 02:20:40.722140 2014] [wsgi:error] [pid 24158:tid >>>> 140182690981632] (2)No such file or directory: [client 24.171.250.159: >>>> 60769] mod_wsgi (pid=24158): Unable to connect to WSGI daemon process >>>> 'snaptee-production-api-ssl' on '/var/run/apache2/wsgi.30188.7.3.sock'. >>>> >>> >>> This can also be due to Apache graceful restart being done and there >>> were keep alive connections being handled from a HTTP client. In an Apache >>> graceful restart, because of Apache handles the mod_wsgi daemon processes, >>> they don't have a graceful shutdown in the same way as Apache child worker >>> processes. >>> >>> So what happens is the following: >>> >>> 1. Apache graceful restart is triggered. >>> 2. Apache parent process sends SIGUSR1 to Apache child worker process to >>> signal graceful shutdown. >>> 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to >>> signal shutdown. >>> 4. The mod_wsgi daemon processes complete their requests and restart. In >>> the next incarnation of the mod_wsgi daemon processes after an Apache >>> restart they expect a different path for the proxy socket, with the number >>> at the end increasing based on Apache generation number. >>> 5. The Apache child worker process because it was in a graceful restart >>> mode, operates on the understanding that it can keep handling any requests >>> on a keep alive socket connection from a HTTP client until there are no >>> more. It therefore takes next request on same connection and tries to >>> connect to mod_wsgi daemon process, but using the proxy socket name as was >>> used before, but that name has changed for the next Apache configuration >>> generation and no longer exists, thus it fails. >>> >>> The name of the proxy socket changes across Apache restarts because >>> otherwise you could have Apache child worker processes under an old >>> configuration sending requests to a mod_wsgi daemon process using the new >>> configuration, which could cause problems including security issues. There >>> are therefore specific protections in place to ensure that only Apache >>> child worker processes and mod_wsgi daemon mode processes created against >>> the same Apache configuration generation can talk to each other. >>> >>> >>>> wsgi config for that site >>>> WSGIDaemonProcess site-1 display-name=site-1 user=www-data threads=25 >>>> python-path=/home/ubuntu/site-1/django:/home/ubuntu/.virtualenvs/site-1 >>>> /lib/python2.7/site-packages >>>> WSGIProcessGroup site-1 >>>> WSGIApplicationGroup %{GLOBAL} >>>> WSGIScriptAlias / /home/ubuntu/site-1/apache/wsgi.py >>>> >>>> worker.conf >>>> <IfModule mpm_worker_module> >>>> StartServers 2 >>>> MinSpareThreads 25 >>>> MaxSpareThreads 75 >>>> ThreadLimit 64 >>>> ThreadsPerChild 25 >>>> MaxRequestWorkers 150 >>>> MaxConnectionsPerChild 1000 >>>> </IfModule> >>>> >>> >>> So my best guess is that you are doing Apache graceful restarts when >>> these are occurring. >>> >>> Are you using Apache graceful restarts as suspected? >>> >>> Graham >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/modwsgi. >> For more options, visit https://groups.google.com/d/optout. >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected] <javascript:> > . > Visit this group at http://groups.google.com/group/modwsgi. > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/modwsgi. For more options, visit https://groups.google.com/d/optout.
