On 15 April 2011 05:18, Chase <[email protected]> wrote: > I have a custom Django app that's becoming unresponsive > intermittently. About once every couple of days between three servers, > serving about 10,000 requests a day. When it happens, it never > recovers. I can leave it there for hours, and it will not server any > more requests. > > > In the apache logs, I see see the following: > > Apr 13 11:45:07 www3 apache2[27590]: **successful view render here** > ... > Apr 13 11:47:11 www3 apache2[24032]: [error] server is within > MinSpareThreads of MaxClients, consider raising the MaxClients setting > Apr 13 11:47:43 www3 apache2[24032]: [error] server reached MaxClients > setting, consider raising the MaxClients setting > ... > Apr 13 11:50:34 www3 apache2[27617]: [error] [client 10.177.0.204] > Script timed out before returning headers: django.wsgi > (repeated 100 times, exactly) > > > I am running: > > apache version 2.2, using the worker MPM > wsgi version 2.8 > SELinux NOT installed > lxml package being used, infrequently > Ubuntu 10.04 > > > apache config: > > WSGIDaemonProcess site-1 user=django group=django threads=50 > WSGIProcessGroup site-1 > WSGIScriptAlias / /somepath/django.wsgi /somepath/django.wsgi > > > wsgi config: > > import os, sys > sys.path.append('/home/django') > os.environ['DJANGO_SETTINGS_MODULE'] = 'myapp.settings' > import django.core.handlers.wsgi > application = django.core.handlers.wsgi.WSGIHandler() > > > When this happens, I can kill the wsgi process and the server will > recover. > >>ps aux|grep django # process is running as user "django" > django 27590 5.3 17.4 908024 178760 ? Sl Apr12 76:09 /usr/ > sbin/apache2 -k start >>kill -9 27590 > > > This leads me to believe that the problem is a known issue: > > "(deadlock-timeout) Defines the maximum number of seconds allowed to > pass before the daemon process is shutdown and restarted after a > potential deadlock on the Python GIL has been detected. The default is > 300 seconds. This option exists to combat the problem of a daemon > process freezing as the result of a rouge Python C extension module > which doesn't properly release the Python GIL when entering into a > blocking or long running operation." > > > However, I'm not sure why this condition is not clearing > automatically. I do see that the script timeout occurs exactly 5 > minutes after the last successful page render, so the deadlock-timeout > is getting triggered. But it does not actually kill the process.
They likely aren't being killed because there isn't actually a deadlock of a single thread which hasn't release the GIL. In other words, what the dead lock timeout will not protect against is threads calling into C code, releasing the GIL and then deadlocking in C code. In your case, the problem is going to be the lxml module. This module is known not to work in Python sub interpreters properly. Specifically, the lxml can release the GIL and then attempt to do a callback into Python code. To do this, it uses the simplified GIL state API in Python to reacquire the GIL, but that API is only supposed to be used if running in the main Python interpreter and not a sub interpreter. When used in a sub interpreter, the code will deadlock on trying to reacquire the Python GIL. That lxml is a problem is documented in: http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Multiple_Python_Sub_Interpreters The solution, since you are only delegating one application to that mod_wsgi daemon process group, is to add: WSGIApplicationGroup %{GLOBAL} This will force the application to run in the main Python interpreter and avoid the shortcomings of lxml module. As how you might protect against this sort of deadlock in C code when GIL isn't locked, the only way is to use 'inactivity-timeout'. This will cause a restart when there has been no new requests and/or no reading of request content or generation of response content for that timeout period. So, this could be used as a fail safe, but if your application is used in frequently, it will also have the affect of causing your idle process to be restarted after the timeout period as well. BTW, in worst cases, for detecting what process is doing, one can use either: http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_Python_Stack_Traces http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_Crashes_With_GDB > I'm thinking of switching to MPM/prefork, but I'm not sure if that > should have any effect, given that I'm in daemon mode already. Prefork for some people has been causing subtle problems and I would avoid it if you can. Graham -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
