If you are not using embedded mode, ie., only using daemon mode, then add the directive:
WSGIRestrictEmbedded On This will tell mod_wsgi not to bother to initialise the Python interpreter in the Apache server child processes, given it will not be required. This presumes mod_wsgi 3.X is being used as 2.X behaves differently. That should eliminate those messages and make it clearer what is going on. I will explain more later when have the time to catch up on all my email. Graham On 9 February 2010 10:42, Alec Flett <[email protected]> wrote: > > Ok, I think I'm starting to get a handle on whats going on. > > For background, we run in prefork mode. We currently have: > StartServers 5 > MinSpareServers 5 > MaxSpareServers 10 > ServerLimit 600 > MaxClients 600 > MaxRequestsPerChild 1000 > > For mod_wsgi I've got maximum-requests=1000 > > For a bunch of PIDs, these are the mod_wsgi log messages I see: > pids: 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 434, > 473, > Initializing Python. > Attach interpreter ''. > Destroying interpreters. > Cleanup interpreter ''. > Terminating Python. > Python has shutdown. > > Now I did some exploring and it turns out those PIDs are apache children, > NOT mod_wsgi daemons. > > I think that apache is quietly shutting down apache children, perhaps when > they reach MaxRequestsPerChild, and this is taking the mod_wsgi children > down with them, and mod_wsgi is not restarting those children. Could there > possibly be some off-by-one bug where if we're on the 1000th request, > mod_wsgi thinks "kill this child, and restart it" but then apache comes in > and kills the child just before it starts? > > Alec > > On Feb 8, 2010, at 11:38 AM, Alec Flett wrote: > >> Ok, I've now found wsgi_manage_process... >> >> FWIW I haven't been able to reproduce the crash by calling >> os.kill(os.getcwd(), signal.SIGBUS) and frankly I'm not even sure how >> specifically our children are crashing, if it's a SIGBUS or something else. >> all I know is the state I find the appserver in and there's little to >> nothing from the logs >> >> I'm going to keep digging... >> >> Alec >> >> On Feb 8, 2010, at 10:35 AM, Alec Flett wrote: >> >>> So I'm still seeing this problem - that our python processes are crashing >>> for some reason (our problem, I'm sure) but mod_wsgi isn't restarting them. >>> >>> I just perused the mod_wsgi.c source and I don't see anything that would >>> restart children if they crashed? In particular, I don't see anything >>> catching SIGCHLD but I'm willing to believe the the apr_ APIs are doing >>> this in a different way. >>> >>> Also is there some kind of scoreboard telling which children are >>> available to recieve new requests? Because the server continues to serve >>> requests except for the missing children, leading me to believe mod_wsgi has >>> somehow figured out that the dead children are not allowed to handle new >>> requests. >>> >>> Can you point me at the crash-recovery code? >>> >>> Alec >>> >>> On Jan 28, 2010, at 9:51 PM, Graham Dumpleton wrote: >>> >>>> 2010/1/29 Alec Flett <[email protected]>: >>>>> >>>>> On Jan 27, 2010, at 3:07 PM, Graham Dumpleton wrote: >>>>> >>>>>> Should restart on a crash automatically. >>>>>> >>>>>> One cause of what you are seeing is Python threads being deadlocked >>>>>> and over time causing available threads to be used up. >>>>>> >>>>>> Are you using multithread daemons? Is your code and third party >>>>>> modules thread safe? >>>>>> >>>>> >>>>> nope, single-threaded! threads=1 on the WSGIDaemonProcess line. >>>>> >>>>>> Try setting 'inactivity-timeout=120' as option to WSGIDaemonProcess. >>>>>> >>>>> >>>>> great, that seems like a good idea anyway. >>>>>> >>>>>> I would also suggest setting LogLevel to 'info' so that additional >>>>>> information printed out in error logs about process restarts. >>>>>> >>>>> That was going to be my next question ...:) >>>>> >>>>>> >>>>>> This way you might get an idea what request threads are actually >>>>>> doing. >>>>>> >>>>> So none of this explains the "missing daemons" problem - where the >>>>> daemons >>>>> are not actually starting back up again... as you can see below, I set >>>>> the >>>>> display-name so that I can look at the daemons with "ps" - when I do a >>>>> ps ax >>>>> | grep <group> I only see a few processes >>>> >>>> The extra level of logging may show if processes are doing some sort >>>> of shutdown. If they are crashing, then you should already see >>>> segmentation fault messages in main Apache error log, not virtual >>>> host, so make sure you check both logs. >>>> >>>> The processes should be restarted if they truly exit or crash. If it >>>> is an order process restart due to maximum requests or WSGI script >>>> file being touched, there is also a fail safe which defaults to 5 >>>> seconds. If it doesn't die in that time a thread should cause it to >>>> kill itself. The only way this would work in that way is if some C >>>> extension module for Python had registered a competing C code level >>>> signal handler or blocked signals and it interfered with mod_wsgi. In >>>> that case though the process would still exist and you should still >>>> see it. >>>> >>>> If it was an Apache restart that triggered process restart, you >>>> presumably would have known about that unless you have some automated >>>> system which does that. Even so, Apache will kill any daemon process >>>> off which don't shut down in 3 seconds. >>>> >>>> Can't also be case that processes are zombies, because that would mean >>>> Apache isn't doing wait on their exit code, which it should be. >>>> >>>> So, all quite confusing. >>>> >>>>> (in fact one of my servers in >>>>> production has dropped from the original 24 process, down to 7 >>>>> yesterday, >>>>> and now only at 3 today!) >>>> >>>> Unless you have long lived requests, 24 process is actually quite a >>>> lot. Any well tuned system should manage with a lot less. >>>> >>>> Even with that number of processes, since not multithreaded, unless >>>> you have a problem in your code with not releasing file descriptors, >>>> wouldn't expect to run out of resources. You might though use lsof or >>>> ofiles or other tool to work out if large number of file descriptors >>>> in use. Even then, if Apache/mod_wsgi can't restart processes because >>>> of that, you should see error messages in main Apache error log. >>>> >>>> Graham >>>> >>>>>> Let me know what you find and also post your actual daemon mode >>>>>> configuration. >>>>>> >>>>> >>>>> Here's one of them: >>>>> >>>>> ############################# >>>>> # Project: client >>>>> ############################## >>>>> >>>>> WSGIDaemonProcess client-freebase.com processes=24 threads=1 >>>>> display-name=%{GROU >>>>> P} >>>>> python-path=/mw/app/client_88277/_install/lib/python2.6/site-packages >>>>> maximum >>>>> -requests=1000 >>>>> >>>>> WSGIScriptAlias / /mw/app/client_88277/_install/bin/client.wsgi >>>>> >>>>> # Server configuration for client >>>>> <Directory /mw/app/client_88277/_install/bin> >>>>> WSGIProcessGroup client-freebase.com >>>>> </Directory> >>>>> >>>>> >>>>> >>>>>> Graham >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups >>>>>> "modwsgi" group. >>>>>> To post to this group, send email to [email protected]. >>>>>> To unsubscribe from this group, send email to >>>>>> [email protected]. >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/group/modwsgi?hl=en. >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups >>>>> "modwsgi" group. >>>>> To post to this group, send email to [email protected]. >>>>> To unsubscribe from this group, send email to >>>>> [email protected]. >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/modwsgi?hl=en. >>>>> >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "modwsgi" group. >>>> To post to this group, send email to [email protected]. >>>> To unsubscribe from this group, send email to >>>> [email protected]. >>>> For more options, visit this group at >>>> http://groups.google.com/group/modwsgi?hl=en. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "modwsgi" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/modwsgi?hl=en. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/modwsgi?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/modwsgi?hl=en. > > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
