I've posted here before wondering about disappearing Python daemons, and I have 
a new theory about apache children and python daemon lifetimes. We're doing our 
own experiments on this stuff but I'm hoping to get some insight because (as 
you'll see below) it can sometimes take days to figure out if our experiment 
has worked or not.

For background, again: We are using mod_wsgi in daemon mode, using the prefork 
MPM and we are finding that we're "losing" daemons - after apache has been 
running for a while (2-3 days) we find that there are fewer and fewer python 
daemons serving up a given WSGIProcessGroup - we have 3 process groups, and the 
one that handles 80% of the traffic on the site is the one that loses the most 
daemons. We start out each process group with 24 daemons, and after a few days, 
the high-traffic group usually only has two or three daemons left, and number 
of waiting apache processes starts to skyrocket - we start hitting MaxClients, 
set to 600 (big, I know!) and then the server starts dropping connections 
altogether.

(as an aside, I know these are big numbers, but the high-traffic process group 
is a very CPU-intense application, and these are on big honkin' servers with a 
ton of RAM. There are actually 8 physical servers too...)
 
My new theory is related to the following settings we're currently using, both 
purely for historic reasons related to problems with mod_python and our own 
leaky code:

1) we set MaxRequestPerChild to 1000
2) we don't have GracefulShutdownTimeout set

In going through our logs, tracking each apache child by PID and each python 
daemon by PID, what I found was that for every python daemon that was started, 
there are clean log messages about shutting down those daemons, and that they 
are usually closely linked with an apache process hitting its 
MaxRequestPerChild limit.

My theory is that apache (or apr?) is tracking all the children that the apache 
children fork, and when an apache child exits, it's also cleaning up any 
fork'ed children so that there are no stray daemons hanging around - i.e. 
something beyond the realm of mod_wsgi. And when the python daemon is 'cleaned 
up' its too late for mod_wsgi to fork a new one - i.e. mod_wsgi has already 
been unhooked or something.

I think the GracefulShutdownTimeout may also be related - it's also unset for 
previous mod_python problems - as I understand it, if you don't set 
GracefulShutdownTimeout, then apache will immediately kill all children on 
shutdown, rather than giving them time to complete their current request. I'm 
wondering if it also affects the way the MaxRequestsPerChild setting works, 
that the apache child gets killed harder than it normally would, when 
MaxRequestsPerChild is reached?

Any insights here? Since our apache usage is purely just a mod_wsgi container, 
I'm wondering if it's generally considered safe to use 'MaxRequestPerChild 0' 
in a production environment? We haven't used that setting ever, so we've just 
been wary of not having ANY limit...

Alec

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Reply via email to