In the last posts of the thread at:

  http://groups.google.com/group/modwsgi/browse_frm/thread/b0fced31f191df59#

I explained the different sequence of logged messages you would see in
Apache error log for different ways that mod_wsgi daemon processes
could exit.

Did you analyse your logs and capture what you believe was the
particular trigger for why the daemon processes are exiting in the
first place. Ie., crash, maximum requests, signal, inactivity timeout
or deadlock timeout?

Did you also use mod_wsgi 3.X and WSGIRestrictEmbedded to stop the
initialisation of the Python interpreter in Apache server child
processes (as distinct from mod_wsgi daemon processes) to eliminate
from the Apache error log the potentially confusing messages for where
Python was being cleaned up in Apache server child processes, even
though Python in those processes wasn't being used?

Did you, after having added display-name option to WSGIDaemonProcess
capture output from 'ps' showing normal state of operation with
'httpd' process along side of 'wsgi' processes and then later after
when you claim number of processes had reduced so I can see it? This
would be be much easier visual way of seeing if processes have dropped
in number than relying on matching up log file entries. Ensure 'ps'
output shows parent/child relation ships and don't filter out zombie
process, whether with distinct process name or not so can see if
processes are perhaps stuck shutting down rather than actually
vanished.

BTW, I have never seen you post about what version of Apache you are
using, nor what operating system/distribution you are using.

More below.

On 2 March 2010 04:48, Alec Flett <[email protected]> wrote:
> I've posted here before wondering about disappearing Python daemons, and I 
> have a new theory about apache children and python daemon lifetimes. We're 
> doing our own experiments on this stuff but I'm hoping to get some insight 
> because (as you'll see below) it can sometimes take days to figure out if our 
> experiment has worked or not.
>
> For background, again: We are using mod_wsgi in daemon mode, using the 
> prefork MPM and we are finding that we're "losing" daemons - after apache has 
> been running for a while (2-3 days) we find that there are fewer and fewer 
> python daemons serving up a given WSGIProcessGroup - we have 3 process 
> groups, and the one that handles 80% of the traffic on the site is the one 
> that loses the most daemons. We start out each process group with 24 daemons, 
> and after a few days, the high-traffic group usually only has two or three 
> daemons left, and number of waiting apache processes starts to skyrocket - we 
> start hitting MaxClients, set to 600 (big, I know!) and then the server 
> starts dropping connections altogether.
>
> (as an aside, I know these are big numbers, but the high-traffic process 
> group is a very CPU-intense application, and these are on big honkin' servers 
> with a ton of RAM. There are actually 8 physical servers too...)
>
> My new theory is related to the following settings we're currently using, 
> both purely for historic reasons related to problems with mod_python and our 
> own leaky code:
>
> 1) we set MaxRequestPerChild to 1000
> 2) we don't have GracefulShutdownTimeout set
>
> In going through our logs, tracking each apache child by PID and each python 
> daemon by PID, what I found was that for every python daemon that was 
> started, there are clean log messages about shutting down those daemons, and 
> that they are usually closely linked with an apache process hitting its 
> MaxRequestPerChild limit.
>
> My theory is that apache (or apr?) is tracking all the children that the 
> apache children fork, and when an apache child exits, it's also cleaning up 
> any fork'ed children so that there are no stray daemons hanging around - i.e. 
> something beyond the realm of mod_wsgi. And when the python daemon is 
> 'cleaned up' its too late for mod_wsgi to fork a new one - i.e. mod_wsgi has 
> already been unhooked or something.

There should be no connection, the mod_wsgi daemon processes are
managed distinct from the Apache server child processes and in both
cases they are forked from Apache parent process. In other words,
mod_wsgi daemon processes are not forked from Apache server child
processes and so an Apache server child process being killed should
not cause a mod_wsgi daemon process to be killed.

> I think the GracefulShutdownTimeout may also be related - it's also unset for 
> previous mod_python problems - as I understand it, if you don't set 
> GracefulShutdownTimeout, then apache will immediately kill all children on 
> shutdown, rather than giving them time to complete their current request.

Not entirely true. Apache will send a SIGINT signal to the sub process
but this doesn't cause it to die immediately. The process does have a
window to finish the request. If the Apache parent process doesn't see
the process terminate within 1 second, it sends SIGINT again. Does
same and sends another SIGINT after further 1 second. If still doesn't
exit after 3 seconds from first signal, then it sends a SIGKILL.

> I'm wondering if it also affects the way the MaxRequestsPerChild setting 
> works, that the apache child gets killed harder than it normally would, when 
> MaxRequestsPerChild is reached?

Max requests per child is handled within the Apache server child
process and when reached it will exit of its own accord. Thus no
signals involved. I don't recollect what would happen in multithreaded
MPM if requests are tardy in completing. I sort of remember that it
will not accept new requests when it gets to this state.

Anyway, this is only pertinent to Apache server child processes and
not mod_wsgi daemon processes as code for mod_wsgi daemon processes is
completely different and doesn't rely on that directive.

> Any insights here? Since our apache usage is purely just a mod_wsgi 
> container, I'm wondering if it's generally considered safe to use 
> 'MaxRequestPerChild 0' in a production environment? We haven't used that 
> setting ever, so we've just been wary of not having ANY limit...

If you are not handling dynamic requests of any form in the Apache
server child processes, be it PHP, Python, Perl etc, then there is
generally no reason to set MaxRequestsPerChild to anything but 0. If
Apache is as such only handling static file serving and/or proxying to
mod_wsgi daemon processes, then worker MPM would be much better choice
than prefork as have mentioned before.

Similarly, unless you have issues with resource leakage in your Python
application, I wouldn't use maximum-requests to WSGIDaemonProcess
either. For a site with large amount of traffic, would certainly be
cautious about setting it to to low a value, ie., 1000, if it could be
causing processes to be recycled within a matter of seconds.

Graham

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Reply via email to