On 20 November 2011 14:08, Rodrigo Campos <[email protected]> wrote: > Hi, > > I have an application that is freezing at some random times. It > doesn't seem to be obviously related with anything. And restarting > apache seems to fix it. > > My first guess was to lower the "maximum-requests" used in WSGI from > 500 to 200 to avoid any leak that we might be hitting (open fds, mem, > or something like that), change the LogLevel from warning to info in > apache config and add "inactivity-timeout=600". Just in case, the WSGI > relevant parts in the apache vhost looks like this now: > > WSGIDaemonProcess <name> processes=4 maximum-requests=200 threads=1 > inactivity-timeout=600 > WSGIProcessGroup <some_user> > WSGIScriptAlias / /<path>/apache.wsgi > WSGIApplicationGroup %{GLOBAL} > > > But with this same config, it happened again today. On the error logs > during that time there several errors like this one: > > [Sat Nov 19 02:59:28 2011] [error] [client 127.0.0.1] Script timed > out before returning headers: apache.wsgi > > In fact, there are 47 of them. And then 3 like this: > > [Sat Nov 19 08:02:25 2011] [info] mod_wsgi (pid=11550): Daemon > process inactivity timer expired, stopping process '<name>'. > > And 1 like: > > [Sat Nov 19 08:03:06 2011] [info] mod_wsgi (pid=18334): Maximum > requests reached '<name>'. > [Sat Nov 19 08:03:06 2011] [info] mod_wsgi (pid=18334): Shutdown > requested '<name>'. > [Sat Nov 19 08:03:06 2011] [info] mod_wsgi (pid=18334): Stopping > process '<name>'. > [Sat Nov 19 08:03:06 2011] [info] mod_wsgi (pid=18334): Destroying > interpreters. > [Sat Nov 19 08:03:06 2011] [info] mod_wsgi (pid=18334): Cleanup > interpreter ''. > [Sat Nov 19 08:03:07 2011] [info] mod_wsgi (pid=18334): Terminating Python. > [Sat Nov 19 08:03:07 2011] [info] mod_wsgi (pid=18334): Python has shutdown. > > > After this it all seems to be normal and working again. So it seems > the inactivity time out parameter worked as expected, it wasn't > necessary to restart apache this time. > > Also, it seems there is something wrong with the date/hour I didn't > yet look. It didn't happen so much time between the "Script timed out" > and the killing. It seems some messages are logged using one time and > some others using other timezone/something like that. Because I have > this in the logs, for example:
The timezone difference will be because one log message comes from Apache child process and the other from the mod_wsgi daemon process. If Django is overriding timezone then what is displayed will be different to what Apache child processes show. > [Sat Nov 19 08:04:31 2011] [info] mod_wsgi (pid=13853): Aborting > process '<name>'. > [Sat Nov 19 03:04:31 2011] [error] [client 127.0.0.1] Premature end of > script headers: apache.wsgi > [Sat Nov 19 08:04:31 2011] [info] mod_wsgi (pid=18356): Python has shutdown. > [Sat Nov 19 03:04:32 2011] [info] mod_wsgi (pid=18404): Attach interpreter ''. > > > But the time is actually bad configured on that server (is not on our > timezone and I will fix it as soon I can confirm the application > wouldn't have any problems with that), so perhaps it is reported in > some place as one timezone and some other in some other config file. > > This application uses a Mysql (percona, a fork of mysql actually) > hosted on some other machine. Looking the munin and new-relic stats I > don't see anything weird. Although in muning, with the mysql plugin, I > see a huge increase for some minutes of "binlog cache usage". That's > the only "anomaly" I see. > > Anyone has any idea/tip on how can I further debug this ? > > I've seen there are commits in WSGI (not included in any release yet) > that add a blocked timeout and blocker request parameters Yep. The inactivity-timeout has been serving double duty. It was really intended for when the process is completely idle with no requests arriving or being handled. In place of a better solution at the time, also made it so that it would detect case where no requests arriving but there were existing requests and for the total of those active requests there was no input or output. So, worked as a poor failsafe for when the process as a whole hung because of all requests blocking for some reason. In mod_wsgi 4.0 those concepts are properly separated. The inactivity-timeout is only for case where process is completely idle, no active requests at all. The new options for the busy case are blocked-requests and blocked-timeout. By default the blocked restart is not enabled. It can be enabled by setting blocked-timeout in much the same was as inactivity-timeout option. When blocked-requests is not set, it defaults to what ever the number of threads is for the mod_wsgi daemon process. What happens is that when the blocked-timeout is defined, when the number of concurrent requests is equal to or exceeds blocked-requests, if there is no input or output for any requests for period set by blocked-timeout, then the process will be restarted. So, if blocked-requests isn't set and defaults to total threads, then when whole process effectively hangs, then it will restart. If blocked-requests is set lower, for example, to half total threads, then behaviour is a bit different. In this case the check for input and output starts at a lower number of potentially blocked requests. So, the checks for no activity against blocked timeout period will happen, but not all threads will be in blocked state at this point and so new requests could still be accepted and handled. If it so happened that you had a lull period where no requests arrived and so timeout expired with the requests still blocked, then the process will be restarted. Setting blocked-requests to a lower value than total number of threads therefore allows you to potentially have some head room where can still handle requests but will try and do a restart if can in an idle period. So, a form of graceful restart of sorts. In practice though, if you are under a constant request load, it is unlikely this will occur. I need to think about how this all works some more and see whether I can better determine if individual requests are producing no input and output and so likely blocked. I may yet be able to come up with a slightly better way of handling this and cause a restart sooner as soon as no other active requests in addition to the blocked ones. That way will not have situation where in busy site will still end up reaching maximum number of threads being blocked before forced restart. > and an other > one to get some information of the processes/threads (which > information ? This is some experimental stuff that works in conjunction with New Relic agent to track number of concurrent requests and thread utilisation. It can be graphed in New Relic using custom views. I am not entirely happy with interface into mod_wsgi for this at this point and so will be changing how this works. > a call-trace perhaps ? :-)). I have just committed some changes that adds a feature where by if a restart is performed due to blocked requests then it will dump out what any active requests in the process are doing. You would therefore see in the Apache logs: [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Daemon process busy inactivity timer expired, stopping process 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Shutdown requested 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Dumping stack trace for active Python threads. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Thread 4320296960 executing file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 8, in function [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): called from file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 12, in application. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Thread 4320833536 executing file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 8, in function [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): called from file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 12, in application. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Thread 4321370112 executing file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 8, in function [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): called from file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 12, in application. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Thread 4313858048 executing file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 8, in function [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): called from file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 12, in application. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread 10 in daemon process 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread 9 in daemon process 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread 5 in daemon process 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread 8 in daemon process 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread 7 in daemon process 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread 6 in daemon process 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread 4 in daemon process 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread 3 in daemon process 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread 2 in daemon process 'hello-1'. [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread 1 in daemon process 'hello-1'. [Mon Nov 21 15:18:57 2011] [info] mod_wsgi (pid=12256): Aborting process 'hello-1'. [Mon Nov 21 15:18:57 2011] [error] [client 127.0.0.1] Premature end of script headers: environ.wsgi [Mon Nov 21 15:18:57 2011] [error] [client 127.0.0.1] Premature end of script headers: environ.wsgi [Mon Nov 21 15:18:57 2011] [error] [client 127.0.0.1] Premature end of script headers: environ.wsgi [Mon Nov 21 15:18:57 2011] [error] [client 127.0.0.1] Premature end of script headers: environ.wsgi It does require LogLevel in Apache to be set to 'info' though. There is only file name, line number and function name context. It isn't really feasible to give code line like normal Python tracebacks as this is reaching from underneath all interpreters and written in C code. It is happening on shutdown and also has to avoid doing stuff that might result in GIL being released and other code being run. So, has to be simple as possible. Should be sufficient though in helping to target trouble spot. > If I understand correctly > this could be *very* helpful to debug this problem. Also, if we could > have a trace where the blocked process where hanging it will be *very > very* useful :-D > > Is there some other configuration option that could help ? Do you know > when will be a new release that includes the "blocked request" > parameters ? Above is all I can think of right now in rush before have to go back home to no proper internet. Graham > Oh, btw, I'm using libapache2-mod-wsg 3.3-2ubuntu2 > > > Also, if you need more information, please let me know. > > > > > > Thanks a lot, > Rodrigo > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/modwsgi?hl=en. > > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
