Your operating system could be signalling Apache during the middle of the night so that the log files can be rotated.
Graham On 31/10/2014, at 6:36 PM, Kelvin Wong <[email protected]> wrote: > I checked the memory of overall Apache process, it goes up and down a little. > It Seemed the "maximum-requests" is doing its job. > > I found that when I do "sudo service apache reload", Apache will do graceful > restart as it shown in main Apache error log. > > > [Sun Oct 26 07:34:37.198806 2014] [wsgi:error] [pid 27816:tid > 140052910765824] (11)Resource temporarily unavailable: [client > xx.xx.xx.xx:xxxxx] mod_wsgi (pid=27816): Unable to connect to WSGI daemon > process 'site-1' on '/var/run/apache2/wsgi.17227.2.3.sock'. > When before above situation happened, Apache also doing graceful restart. > But this should not be triggered by anyone as I mentioned it happened during > sleep time. > Is there any reasons that Apache do graceful restart by itself? > > Thanks. > > > On Friday, October 31, 2014 2:55:32 PM UTC+8, Graham Dumpleton wrote: > BTW, in your NR agent configuration file find the > error_collector.ignore_errors setting and change it to: > > error_collector.ignore_errors = django.http.request:UnreadablePostError > > These indicate the HTTP client connection dropping before all content had > been sent. > > Generally you want to ignore these and eliminate them from your error count > as they do not represent an error in your application but are the result of a > client problem. > > Adding that configuration setting will cause them to be ignored and not > recorded as an error. > > Graham > > On 31/10/2014, at 5:41 PM, Graham Dumpleton <[email protected]> wrote: > > > On 31/10/2014, at 5:04 PM, Kelvin Wong <[email protected]> wrote: > > Thanks for the details! > > I now have a better picture on how it works. > I employed all the recommendations you suggested. I will keep monitoring the > processes. > > But how about the graceful restart signaled by Apache? Why it happened? > > As far as the message: > > [Tue Oct 28 02:20:40.722140 2014] [wsgi:error] [pid 24158:tid > 140182690981632] (2)No such file or directory: [client 24.171.250.159:60769] > mod_wsgi (pid=24158): Unable to connect to WSGI daemon process > 'snaptee-production-api-ssl' on'/var/run/apache2/wsgi.30188.7.3.sock'. > > goes, as I said I believe that could only be the result of an Apache graceful > restart. > > My recollection is that a 'reload' under a Linux system using its process > management tools triggers an Apache graceful restart where as a 'restart' > does a normal Apache restart. > > I believe you said you used 'reload' and so that would be the source of that. > > The whole reason though I have been trying to sort out what processes > actually had memory issues though was to avoid MaxConnectionsPerChild as well > to ensure that that wasn't a source of the problem as well. > > Now that we have you on a better configuration lets see now whether it > happens again, rather than worrying too much about what happened in the past. > Just note when you happen to do a 'reload', and so understand that if seen at > that time that that was likely the cause. > > Once things settle down for a while then can look at your processes/threads > configuration and capacity utilisation and so work out whether a poor > configuration there is causing problems. Can also see whether you are getting > stuck requests as well which are eating into capacity and which could result > in some of the errors you were seeing when any sort of restart was done. > > So slowly slowly. Resolve one issue at a time. Changing too many things at > once will only confuse and you will not get a sense of what changes are > actually making things better otherwise. > > Graham > > > Thanks again. > > > On Friday, October 31, 2014 12:29:12 PM UTC+8, Graham Dumpleton wrote: > So lets look at just memory initially. > > When we talk about Apache processes, there are actually going to be three > different classes of processes. > > These are: > > 1. The Apache parent process. > > This runs as root. Because it runs as root and not the Apache user (www-data) > that the Apache child worker processes run as, then it can be seen separately > on the NR server monitoring. > > > There is no memory increase here. It is actually very important that the > Apache parent process doesn't increase in size over time with restarts as > when it forks child process they would inherit that increasing memory size. > One of the reason that using mod_python is a bad idea is that it used to > initialise the Python interpreter in the Apache parent process and when > restarted, the Python interpreter would leak memory and so the Apache parent > process would grow in size on each restart. > > Anyway, you are good here and nothing to worry about. > > 2. The Apache child worker processes. > > These run as the Apache user which on your system is www-data. > > The Apache child worker processes are what accepts all inbound HTTP requests. > Static files will be served up by these processes. > > In your case where you are using mod_wsgi daemon processes, the Python web > application does not run in the Apache child worker processes. The Apache > child worker process still accept the requests initially though and would > then proxy the request through to the daemon process running the WSGI > application. > > 3. The mod_wsgi daemon processes. > > By default these also run as the Apache user which on your system is www-data. > > This is where your WSGI application is running. > > Now both the Apache child worker processes and the mod_wsgi daemon processes > are forked children of the Apache parent process. They will therefore all > show as being the process 'apache2' on your system. > > Because both (1) and (2) are running as the Apache user, NR server monitoring > will not allow you to distinguish between them. > > Thus when you get: > > > you aren't looking at just the Apache child worker processes and are also > looking at the mod_wsgi daemon processes. > > The only way you could use NR server monitoring to tell between the two is if > you used the 'user' option to WSGIDaemonProcess and ran the mod_wsgi daemon > processes as a different user again. Then you could distinguish. > > Although you can't distinguish using NR server monitoring, you can on Linux > if you ensure you are naming the processes. That is, when using > WSGIDaemonProcess directive, supply the display-name=%{GROUP} option. This > will allow you to see the mod_wsgi daemon processes as distinct if using 'ps' > or 'htop'. > > Lets instead now look at what the mod_wsgi plugin tells you: > > Over the 24 hours matching last charts we get: > > > so nothing conclusive. For a shorter period of one hour we get: > > > This shows an increase over time, but you have to be a little bit careful in > interpreting that. If you hover over the memory chart and look at the popup > you will see: > > > This shows that the processes actually range in size. There is a range of > sizes because the process churn chart shows that processes are being > restarted at different times. > > Important though is to look at the max value. This is the largest size of any > process. If one moves along the chart for that period one sees that the > maximum is always about 12.9-13.0 MB. > > This indicates to me that the Apache child worker process size has reached a > plateau and that this is the normal behaviour of Apache child process to grow > over time up to a plateau which I believe I noted previously. > > So for the Apache child worker processes this isn't ringing any alarm bells > for me. > > If you have already removed MaxConnectionsPerChild and that maximum continues > along and not increasing too much more over time, then I wouldn't be seeing a > need to be setting MaxConnectionsPerChild. > > Next chart to look at is the memory chart where monitoring the actual Python > web application. > > Over the last 6 hours this shows: > > > I am not 100% sure if you can get to this chart on the Lite subscription > level you are on but I can still see it. > > If it is available on Lite subscription, go to the bottom of the overview > dashboard and select on the chart icon for the server breakout. > > > So as far as what process is growing in memory size, it is very much the > mod_wsgi daemon processes. > > The problem therefore is going to be an issue with caching in your > application, construction of object cycles that cannot be broken by the > garbage collector, or in worst case the garbage collector isn't even running > due to it dead locking. > > As a short term remedy to at least keep the memory growth under control for > the mod_wsgi daemon processes, you have two choices. > > The first is to use the 'inactivity-timeout' option to WSGIDaemonProcess > directive. > > The behaviour of this is slightly different between mod_wsgi 3.X and 4.X. > > Under 4.X what will happen when inactivity-timeout is set is that if no > requests at all are received by a daemon process group, and there are no > active ones, after the specified period, then the process will be restarted. > > You run up to 150 requests per minute across about 7 process instances. > > > > In practice this means that any one process is not even handling one request > per second. > > Even so, what it means is that if you set this to be 30 seconds, then a > process is never going to be idle long enough to be restarted. Can't > therefore use this. > > On mod_wsgi 3.X the difference was that the reading of request content and > writing of response content was monitored and if there was no activity on > those after the timeout period, then the process would be restarted. With the > way it worked, it meant if there was a long running request, or stuck > requests, they would be interrupted. In mod_wsgi 4.X, you should be using > request-timeout instead if the intent is to be able to interrupt long running > or stuck requests. > > The second option you can use instead of inactivity-timeout is the > maximum-requests option to WSGIDaemonProcess. > > This also has changed a bit in mod_wsgi 4.X from 3.X. > > Under 3.X once that many requests were processed by a process, the process > would be restarted. In restarting though, it followed the typical shutdown > sequence for mod_wsgi daemon processes. > > That is, no new requests would be accepted on that process. Currently running > requests would be given a grace period to complete (default 2.5 seconds). If > they did not complete in that time, the process would be more brutally > shutdown. > > The result of this was that long running requests could be interrupted, > resulting in some of those errors you saw before. > > In mod_wsgi 4.X this option has the same default behaviour, but there is a > new way of having the restart be more graceful. > > What you instead use is both maximum-requests and graceful-timeout to > WSGIDaemonProcess at the same time. > > When you also have graceful-timeout option specified, what will happen when > the maximum requests is reached is that a new initial grace period is entered > based on what graceful-timeout is set to. > > During this initial grace period, new requests will still be accepted, but if > at any point during that time there are no active requests, the process will > be restarted immediately. > > If for some reason the initial grace period expires, then the normal restart > starts. That is, stop accepting requests, provide a further 2.5 seconds grace > and then brutally restart. > > So this new graceful-timeout option in conjunction with maximum-requests > means that a process can be restarted but with a better ability to do so > without interrupting any requests. > > In your case because there is less than 1 request per second on average > across all processes, there should be a lot of opportunity for a restart to > occur in the gaps in between those infrequent requests. > > My recommendations at this point are: > > 1. Set MaxConnectionsPerChild to 0 so that Apache child worker processes > aren't being recycled unnecessarily. > > Monitor the mod_wsgi plugin chart 'Average Memory Useds (bytes/process)' to > ensure that the Apache child worker processes aren't growing in size in an > unbounded way above the typical max you are seeing now. They should simply > plateau after a while. Keep an eye on max in the popup rather than the > average as the latter can be deceiving as Apache child worker processes can > still be restarted by Apache because of its dynamic scaling feature, thereby > confusing the results you see. > > 2. Add to WSGIDaemonProcess the maximum-requests and graceful-timeout option. > > As you are running only about 20 requests per minute for process, set > maximum-requests at about 250 initially, which would be about every 10 > minutes or so at that rate. If the processes are getting recycled too often > and the startup cost is too much, then increase it. At the same time though > keep an eye on that memory chart in the server breakout on the main dashboard > to ensure memory not growing too much before the process gets recycled. > > At the same time, make sure you set graceful-timeout. A safe figure for this > initially would be 30. That is the number of seconds to wait to see if can > restart when no requests, before a more brutal restart. > > 3. If you haven't done it already and you are definitely only using mod_wsgi > daemon mode for running your WSGI application, ensure you are setting > WSGIRestrictEmbedded to On at global scope in the Apache configuration > outside of any VirtualHost. > > So as an initial remedial action only, to combat memory growth, do that for > now. This will give you some breathing space so you can work out why your web > application is growing memory usage continually. > > When done and when it has been running for 6-12 hours, come back and we can > review how the memory growth charts are looking. I can also get to some other > charts you can't when on Lite subscription, which will show details about the > mod_wsgi daemon process restarts and we can see how far apart they are coming > and whether we need to adjust maximum-requests. > > As I say though, this is just to get you in a better position while you work > out your memory leakage problems. > > A next step I would like to do is have you install an extension to the NR > Python agent which allows the activities of the garbage collector to be > monitored to make sure it is running and whether uncollectable objects are > accumulating. You would not normally be able to use this with a Lite account > but I will be able to see the data. Before I can give you that extension to > try I need to wait on some internal approvals to release it, which are > already in train due to us needing to provide it to some other users who are > seeing memory growth issues which we are trying to help them with. > > For interests sake, the charts it can produce are: > > > Anyway, work slowy through what I have said above and understand it. Get > those few configuration changes made and then we will look at the next steps. > > Graham > > On 31/10/2014, at 1:12 PM, Kelvin Wong <[email protected]> wrote: > > I'm sorry to throw too much questions to you. I just want to provide more > information to make it easier to find the cause. > Also, I rushed to you because this situation happened during sleep time. So > I'm little stressed about it. > > Back to your question. > After following your instructions to setup mod_wsgi plugin to monitor the > mod_wsgi processes in New Relic, I take a look on the data. > I found that it seemed normal to me. Before the 503 situation, "Average > Memory Used (bytes/process)" and "Process Count" do not have great changes. > But looking data from "Server", the Apache processes still accumulating. > So, can we conclude that Apache child worker processes growing in memory? > > Also, as I post previously, the main Apache error log, > [Wed Oct 29 12:56:26.727197 2014] [mpm_worker:error] [pid 1322:tid > 139958218430336] AH00287: server is within MinSpareThreads of > MaxRequestWorkers, consider raising the MaxRequestWorkers setting > [Wed Oct 29 12:56:30.730902 2014] [mpm_worker:error] [pid 1322:tid > 139958218430336] AH00286: server reached MaxRequestWorkers setting, consider > raising the MaxRequestWorkers setting > > I changed the config accordingly, the 503 situation still happened. > <IfModule mpm_worker_module> > StartServers 2 > MinSpareThreads 25 > MaxSpareThreads 75 > ThreadLimit 64 > ThreadsPerChild 25 > MaxRequestWorkers 400 > MaxConnectionsPerChild 0 > </IfModule> > > > Thanks. > > > On Friday, October 31, 2014 3:30:58 AM UTC+8, Graham Dumpleton wrote: > Before I can help you, you need to start answering my questions. If you > pepper me with email after email on different things and don't answer the > questions I ask you, then we will get no where. When I ask questions it is > because that information is likely key to me being able to understand the > problem and help you. > > The main question I put to you was about how you are determining which > processes are growing in memory. > > It needs to be properly determined whether the processes which were growing > in memory were Apache child worker processes or mod_wsgi daemon processes. > > The reason for this is that if they are in fact mod_wsgi daemon processes, > then you are possibly in part causing the problem yourself by setting > MaxConnectionsPerChild. That setting has got nothing to do with the mod_wsgi > daemon processes and by setting it, you are creating the conditions which > cause the sorts of errors you are seeing. > > So, which processes are growing in memory? > > Graham > > On 31/10/2014, at 12:40 AM, Kelvin Wong <[email protected]> wrote: > > I found that on the main Apache log, when it is occurring, Apache is doing > graceful restart. > How can I avoid Apache graceful restart? > > Thanks. > > > On Tuesday, October 28, 2014 1:30:06 PM UTC+8, Graham Dumpleton wrote: > Would suggest upgrading to mod_wsgi 4.3.0 if you can as the error messages > when there are communication problems between Apache child worker process and > mod_wsgi daemon process have been improved. > > More comments below. > > On 28 October 2014 15:43, Kelvin Wong <[email protected]> wrote: > Hi Graham and everyone else > > I'm running multiple site on Django 1.6.7, Apache/2.4.7 (Ubuntu 14.04), > OpenSSL/1.0.1f, mod_wsgi/4.2.5, Python/2.7.6, Server MPM: worker. > I found that the server start returning 504 and then 503, and the following > error shown up. > I researched some issues related with it, even added "WSGISocketPrefix > /var/run/apache2/wsgi", but the issue still occured. > I have no idea why it happened. Can anyone give some directions on this issue? > > Thanks! > > apache error log > [Sun Oct 26 07:34:34.732934 2014] [wsgi:error] [pid 29268:tid > 140053011478272] [client xx.xxx.xxx.xxx:xxxxx] Timeout when reading response > headers from daemon process 'site-1': /home/ubuntu/site-1/apache/wsgi.py > [Sun Oct 26 07:34:37.198806 2014] [wsgi:error] [pid 27816:tid > 140052910765824] (11)Resource temporarily unavailable: [client > xx.xx.xx.xx:xxxxx] mod_wsgi (pid=27816): Unable to connect to WSGI daemon > process 'site-1' on '/var/run/apache2/wsgi.17227.2.3.sock'. > > This one can occur when the mod_wsgi daemon process crashes. There should be > a segmentation fault error message or similar in the main Apache error log > (not VirtualHost specific log). > > It can also occur if there are incomplete requests still running when a > mod_wsgi daemon process is shutdown on being restarted due to the WSGI script > file being touched or if Apache was restarted. In the latter case, the > mod_wsgi daemon process would have had to have been killed off by Apache > before the Apache child worker process which was proxying it to had. This can > especially be the case if an Apache graceful restart was being done. > > occasionally > [Tue Oct 28 02:20:40.722140 2014] [wsgi:error] [pid 24158:tid > 140182690981632] (2)No such file or directory: [client 24.171.250.159:60769] > mod_wsgi (pid=24158): Unable to connect to WSGI daemon process > 'snaptee-production-api-ssl' on '/var/run/apache2/wsgi.30188.7.3.sock'. > > This can also be due to Apache graceful restart being done and there were > keep alive connections being handled from a HTTP client. In an Apache > graceful restart, because of Apache handles the mod_wsgi daemon processes, > they don't have a graceful shutdown in the same way as Apache child worker > processes. > > So what happens is the following: > > 1. Apache graceful restart is triggered. > 2. Apache parent process sends SIGUSR1 to Apache child worker process to > signal graceful shutdown. > 3. Apache parent process sends SIGINT to mod_wsgi daemon processes to signal > shutdown. > 4. The mod_wsgi daemon processes complete their requests and restart. In the > next incarnation of the mod_wsgi daemon processes after an Apache restart > they expect a different path for the proxy socket, with the number at the end > increasing based on Apache generation number. > 5. The Apache child worker process because it was in a graceful restart mode, > operates on the understanding that it can keep handling any requests on a > keep alive socket connection from a HTTP client until there are no more. It > therefore takes next request on same connection and tries to connect to > mod_wsgi daemon process, but using the proxy socket name as was used before, > but that name has changed for the next Apache configuration generation and no > longer exists, thus it fails. > > The name of the proxy socket changes across Apache restarts because otherwise > you could have Apache child worker processes under an old configuration > sending requests to a mod_wsgi daemon process using the new configuration, > which could cause problems including security issues. There are therefore > specific protections in place to ensure that only Apache child worker > processes and mod_wsgi daemon mode processes created against the same Apache > configuration generation can talk to each other. > > wsgi config for that site > WSGIDaemonProcess site-1 display-name=site-1 user=www-data threads=25 > python-path=/home/ubuntu/site-1/django:/home/ubuntu/.virtualenvs/site-1/lib/python2.7/site-packages > WSGIProcessGroup site-1 > WSGIApplicationGroup %{GLOBAL} > WSGIScriptAlias / /home/ubuntu/site-1/apache/wsgi.py > > worker.conf > <IfModule mpm_worker_module> > StartServers 2 > MinSpareThreads 25 > MaxSpareThreads 75 > ThreadLimit 64 > ThreadsPerChild 25 > MaxRequestWorkers 150 > MaxConnectionsPerChild 1000 > </IfModule> > > So my best guess is that you are doing Apache graceful restarts when these > are occurring. > > Are you using Apache graceful restarts as suspected? > > Graham > > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/modwsgi. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/modwsgi. > For more options, visit <a href="https://groups.google.com/d/optout" > target="_blank" > onmousedown="this.href='https://groups.google.com/d/optout';return true;" > onclick="this.href='https://grou > ... > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/modwsgi. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/modwsgi. For more options, visit https://groups.google.com/d/optout.
