Can you explain clearly your original symptoms. The message 'Script
timed out before returning headers' which the subject of this
discussion gave can happen in a number of circumstances and some are
not related to deadlocks.

On 6 June 2011 05:24, rwman <[email protected]> wrote:
> Is there a way to make apache work even when such deadlock occur?

When using daemon mode it has for some time had the ability to detect
timeouts and it should kill off process after 300 seconds. This
doesn't apply if using embedded mode, so when explaining your original
problem you should explain the configuration you are using and
preferably post the mod_wsgi bits from the Apache configuration.

There are some extreme cases where a third party Python extension
module might defeat the deadlock detection, but the extension module
would need to be doing things it probably shouldn't be doing. The dead
lock timeout also will not kick in your code is simply looping or suck
in database queries that take a long time.

> Can a process be killed and restarted automatically?

For true deadlocks, that is what the deadlock detection of daemon mode
does. There is also an optionally enabled inactivity timeout failsafe
as well that can be turned on which helps to recover from non deadlock
cases where request handlers are looping in stuck in database queries.

> I know, it is not a
> solution for actual problem and should be solved by eliminating
> deadlock, but the goal is to make production server work while
> debugging the problem.
> I tried all options of modwsgi that seemed relevant, but could not
> achieve stable apache counficuration. It stuck after some time for
> about 5 hours.

Without an explanation of your original problem, it isn't clear that
you are having a deadlock problem. It could be that you have request
handlers that are getting on loops and never completing, thereby using
up all the request handler threads.

So, give your current configuration and what other variations you have
used, so can see what you are doing and confirm whether using embedded
mode or daemon mode. Also indicate if using Apache prefork or worker
MPM and whether PHP being used in same Apache web server.

Indicate whether you have looked at inactivity-timeout option for
WSGIDaemonProcess and whether you have at least seen deadlock-timeout
option, although the latter defaults to on anyway.

Also indicate whether you have tried adding any variant of the code as
explained in:

  
http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_Python_Stack_Traces

to try and get the daemon process to dump Python stack traces when it
does get stuck so you might work out what it is doing.

You could also try extracting C stack traces as explained in:

  
http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_Crashes_With_GDB

Graham

> On Apr 25, 2:51 am, Graham Dumpleton <[email protected]>
> wrote:
>> That many threads was never a good idea.
>>
>> A possible reason why you are seeing less problems with only 5 threads
>> in a process is that your code or a third party C extension is not
>> thread safe and are perhaps deadlocking.
>>
>> You really need to ascertain when process threads are starting to hang and 
>> use:
>>
>>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_...
>>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_C...
>>
>> to work out what it is doing at that time.
>>
>> Graham
>>
>> On 24 April 2011 03:38, Chase <[email protected]> wrote:
>>
>>
>>
>>
>>
>>
>>
>> > Changed the config from 1 process 50 threads to 3 processed 5 threads.
>> > That seems to have solved it, or at least made it much less likely.
>>
>> >  -Chase
>>
>> > On Apr 16, 7:56 am, Chase <[email protected]> wrote:
>> >> The problem persists. I have removed our calls to lxml; they were not
>> >> critical. We'll see what effect that has going forward.
>>
>> >>    -Chase
>>
>> >> On Apr 16, 12:08 am, Graham Dumpleton <[email protected]>
>> >> wrote:
>>
>> >> > On 16 April 2011 01:04, Chase <[email protected]> wrote:
>>
>> >> > > Wow, lots of good info. Thanks guys! I have made the
>> >> > > "WSGIApplicationGroup %{GLOBAL}" change for now; we'll see if that
>> >> > > clears it up over the next week or so.
>>
>> >> > > As for running in prefork, I have not made that change yet. But here
>> >> > > is the documentation that lead me to believe this was preferred:
>>
>> >> > >http://code.google.com/p/modwsgi/wiki/IntegrationWithDjango
>>
>> >> > > "Now, traditional wisdom in respect of Django has been that it should
>> >> > > perferably only be used on single threaded servers. This would mean
>> >> > > for Apache using the single threaded 'prefork' MPM on UNIX systems and
>> >> > > avoiding the multithreaded 'worker' MPM."
>>
>> >> > > Also, the older modpython docs also advised this:
>>
>> >> > >http://docs.djangoproject.com/en/dev/howto/deployment/modpython/?from...
>>
>> >> > > "Django requires Apache 2.x and mod_python 3.x, and you should use
>> >> > > Apache’s prefork MPM, as opposed to the worker MPM."
>>
>> >> > > Can you link to a discussion of the subtle problems reported with
>> >> > > prefork? Thanks again,
>>
>> >> > That section was more relevant when Django 1.0 had only just come out,
>> >> > which was the first version of Django for which the core was
>> >> > supposedly thread safe.
>>
>> >> > Anyway, the MPM you use isn't particularly relevant as you are using
>> >> > daemon mode and not embedded mode. Which MPM you use is only critical
>> >> > if you are using embedded mode.
>>
>> >> > In daemon mode you have the arbitrary ability to control
>> >> > processes/threads based on whether your application is thread safe.
>>
>> >> > For related reading see:
>>
>> >> >  http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
>> >> >  http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usa...
>>
>> >> > BTW, the IntegrationWithDjango page in the wiki is likely to be
>> >> > completely removed at some point in the near future and I will stop
>> >> > providing details for specific frameworks to cover where frameworks
>> >> > don't themselves provide enough information. I have already removed
>> >> > the pages for most of the other frameworks already. End result is that
>> >> > the frameworks themselves will need to provide decent documentation
>> >> > themselves to cover any idiosyncrasies that exist in setting up their
>> >> > framework to work with mod_wsgi which are due to issues or design
>> >> > decisions related to their framework and which are nothing to do with
>> >> > mod_wsgi. I have had enough of trying to document these framework
>> >> > specific subtleties and framework authors tend to express a belief
>> >> > that their own documentation is already more than adequate even though
>> >> > from what I have seen people still get tripped up when they follow
>> >> > only the documentation provided by the framework. So, I will be
>> >> > devoting my time elsewhere now and not worrying about documenting
>> >> > stuff related to the frameworks or actively assisting users of
>> >> > frameworks on forums related to those frameworks or on general forums
>> >> > such as StackOverflow. Instead, if it is a framework specific issue,
>> >> > you will need to seek help from the developers or the community for
>> >> > that framework.
>>
>> >> > Graham
>>
>> >> > >   -Chase
>>
>> >> > > On Apr 14, 6:30 pm, Graham Dumpleton <[email protected]>
>> >> > > wrote:
>> >> > >> On 15 April 2011 05:18, Chase <[email protected]> wrote:
>>
>> >> > >> > I have a custom Django app that's becoming unresponsive
>> >> > >> > intermittently. About once every couple of days between three 
>> >> > >> > servers,
>> >> > >> > serving about 10,000 requests a day. When it happens, it never
>> >> > >> > recovers. I can leave it there for hours, and it will not server 
>> >> > >> > any
>> >> > >> > more requests.
>>
>> >> > >> > In the apache logs, I see see the following:
>>
>> >> > >> > Apr 13 11:45:07 www3 apache2[27590]: **successful view render 
>> >> > >> > here**
>> >> > >> > ...
>> >> > >> > Apr 13 11:47:11 www3 apache2[24032]: [error] server is within
>> >> > >> > MinSpareThreads of MaxClients, consider raising the MaxClients 
>> >> > >> > setting
>> >> > >> > Apr 13 11:47:43 www3 apache2[24032]: [error] server reached 
>> >> > >> > MaxClients
>> >> > >> > setting, consider raising the MaxClients setting
>> >> > >> > ...
>> >> > >> > Apr 13 11:50:34 www3 apache2[27617]: [error] [client 10.177.0.204]
>> >> > >> > Script timed out before returning headers: django.wsgi
>> >> > >> > (repeated 100 times, exactly)
>>
>> >> > >> > I am running:
>>
>> >> > >> > apache version 2.2, using the worker MPM
>> >> > >> > wsgi version 2.8
>> >> > >> > SELinux NOT installed
>> >> > >> > lxml package being used, infrequently
>> >> > >> > Ubuntu 10.04
>>
>> >> > >> > apache config:
>>
>> >> > >> > WSGIDaemonProcess site-1 user=django group=django threads=50
>> >> > >> > WSGIProcessGroup site-1
>> >> > >> > WSGIScriptAlias / /somepath/django.wsgi /somepath/django.wsgi
>>
>> >> > >> > wsgi config:
>>
>> >> > >> > import os, sys
>> >> > >> > sys.path.append('/home/django')
>> >> > >> > os.environ['DJANGO_SETTINGS_MODULE'] = 'myapp.settings'
>> >> > >> > import django.core.handlers.wsgi
>> >> > >> > application = django.core.handlers.wsgi.WSGIHandler()
>>
>> >> > >> > When this happens, I can kill the wsgi process and the server will
>> >> > >> > recover.
>>
>> >> > >> >>ps aux|grep django # process is running as user "django"
>> >> > >> > django   27590  5.3 17.4 908024 178760 ?       Sl   Apr12  76:09 
>> >> > >> > /usr/
>> >> > >> > sbin/apache2 -k start
>> >> > >> >>kill -9 27590
>>
>> >> > >> > This leads me to believe that the problem is a known issue:
>>
>> >> > >> > "(deadlock-timeout) Defines the maximum number of seconds allowed 
>> >> > >> > to
>> >> > >> > pass before the daemon process is shutdown and restarted after a
>> >> > >> > potential deadlock on the Python GIL has been detected. The 
>> >> > >> > default is
>> >> > >> > 300 seconds. This option exists to combat the problem of a daemon
>> >> > >> > process freezing as the result of a rouge Python C extension module
>> >> > >> > which doesn't properly release the Python GIL when entering into a
>> >> > >> > blocking or long running operation."
>>
>> >> > >> > However, I'm not sure why this condition is not clearing
>> >> > >> > automatically. I do see that the script timeout occurs exactly 5
>> >> > >> > minutes after the last successful page render, so the 
>> >> > >> > deadlock-timeout
>> >> > >> > is getting triggered. But it does not actually kill the process.
>>
>> >> > >> They likely aren't being killed because there isn't actually a
>> >> > >> deadlock of a single thread which hasn't release the GIL.
>>
>> >> > >> In other words, what the dead lock timeout will not protect against 
>> >> > >> is
>> >> > >> threads calling into C code, releasing the GIL and then deadlocking 
>> >> > >> in
>> >> > >> C code.
>>
>> >> > >> In your case, the problem is going to be the lxml module. This module
>> >> > >> is known not to work in Python sub interpreters properly.
>> >> > >> Specifically, the lxml can release the GIL and then attempt to do a
>> >> > >> callback into Python code. To do this, it uses the simplified GIL
>> >> > >> state API in Python to reacquire the GIL, but that API is only
>> >> > >> supposed to be used if running in the main Python interpreter and not
>> >> > >> a sub interpreter. When used in a sub interpreter, the code will
>> >> > >> deadlock on trying to reacquire the Python GIL.
>>
>> >> > >> That lxml is a problem is documented in:
>>
>> >> > >>  http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Multiple_Pyth...
>>
>> >> > >> The solution, since you are only delegating one application to that
>> >> > >> mod_wsgi daemon process group, is to add:
>>
>> >> > >>   WSGIApplicationGroup %{GLOBAL}
>>
>> >> > >> This will force the application to run in the main Python interpreter
>> >> > >> and avoid the shortcomings of lxml module.
>>
>> >> > >> As how you might protect against this sort of deadlock in C code when
>> >> > >> GIL isn't locked, the only way is to use 'inactivity-timeout'. This
>> >> > >> will cause a restart when there has been no new requests and/or no
>> >> > >> reading of request content or generation of response content for that
>> >> > >> timeout period. So, this could be used as a fail safe, but if your
>> >> > >> application is used in frequently, it will also have the affect of
>> >> > >> causing your idle process to be restarted after the timeout period as
>> >> > >> well.
>>
>> >> > >> BTW, in worst cases, for detecting what process is doing, one can 
>> >> > >> use either:
>>
>> >> > >>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_...
>> >> > >>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_C...
>>
>> >> > >> > I'm thinking of switching to MPM/prefork, but I'm not sure if that
>> >> > >> > should have any effect, given that I'm in daemon mode already.
>>
>> >> > >> Prefork for some people has been causing subtle problems and I would
>> >> > >> avoid it if you can.
>>
>> >> > >> Graham
>>
>> >> > > --
>> >> > > You received this message because you are subscribed to the Google 
>> >> > > Groups "modwsgi" group.
>> >> > > To post to this group, send email to [email protected].
>> >> > > To unsubscribe from this
>>
>> ...
>>
>> read more »
>
> --
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/modwsgi?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Reply via email to