Re: [modwsgi] diagnosing "missing" daemons

Alec Flett Tue, 09 Feb 2010 09:13:00 -0800


On Feb 8, 2010, at 5:25 PM, Graham Dumpleton wrote:

I should add that you should also read:

 http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html

I have read this a few times... I feel fairly enlightened, but itstill doesn't explain how I'm losing daemons.


Because by default the Python interpreter still gets created in those
processes, is probably why you are getting confused. That is, those
processes wouldn't be replaced straight away, unlike the daemon mode
processes which would be as they are part of a static pool size,
whereas main Apache server child processes are effectively part of a
dynamic pool size.

But I guess what I'm seeing is that the daemons really are failing torecirculate, or something - I'm still in this world where we'resuddenly at 2-3 daemons, when we started with 24. (I should add thatwe chose 24 because we do have big beefy machines with lots of RAM andcores, plus this particular application tends to be CPU-heavy)

Thus, setting WSGIRestrictEmbedded and disabling default behaviour
that sees Python interpreter still initialised in those processes may
clear things up.

I'll definitely give that a try... at least it should further reducethe number of log messages that may be confusing me.


Alec

Graham
On 9 February 2010 10:56, Graham Dumpleton<[email protected]> wrote:
If you are not using embedded mode, ie., only using daemon mode, then
add the directive:

 WSGIRestrictEmbedded On

This will tell mod_wsgi not to bother to initialise the Python
interpreter in the Apache server child processes, given it will notbe
required.

This presumes mod_wsgi 3.X is being used as 2.X behaves differently.
That should eliminate those messages and make it clearer what isgoing on.
I will explain more later when have the time to catch up on all myemail.
Graham

On 9 February 2010 10:42, Alec Flett <[email protected]> wrote:
Ok, I think I'm starting to get a handle on whats going on.

For background, we run in prefork mode. We currently have:
StartServers         5
MinSpareServers      5
MaxSpareServers     10
ServerLimit         600
MaxClients          600
MaxRequestsPerChild  1000

For mod_wsgi I've got maximum-requests=1000

For a bunch of PIDs, these are the mod_wsgi log messages I see:
pids: 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424,425, 434,
473,
Initializing Python.
Attach interpreter ''.
Destroying interpreters.
Cleanup interpreter ''.
Terminating Python.
Python has shutdown.
Now I did some exploring and it turns out those PIDs are apachechildren,
NOT mod_wsgi daemons.
I think that apache is quietly shutting down apache children,perhaps whenthey reach MaxRequestsPerChild, and this is taking the mod_wsgichildrendown with them, and mod_wsgi is not restarting those children.Could therepossibly be some off-by-one bug where if we're on the 1000threquest,mod_wsgi thinks "kill this child, and restart it" but then apachecomes in
and kills the child just before it starts?

Alec

On Feb 8, 2010, at 11:38 AM, Alec Flett wrote:
Ok, I've now found wsgi_manage_process...

FWIW I haven't been able to reproduce the crash by calling
os.kill(os.getcwd(), signal.SIGBUS) and frankly I'm not even surehowspecifically our children are crashing, if it's a SIGBUS orsomething else.all I know is the state I find the appserver in and there'slittle to
nothing from the logs

I'm going to keep digging...

Alec

On Feb 8, 2010, at 10:35 AM, Alec Flett wrote:
So I'm still seeing this problem - that our python processes arecrashingfor some reason (our problem, I'm sure) but mod_wsgi isn'trestarting them.
I just perused the mod_wsgi.c source and I don't see anythingthat wouldrestart children if they crashed? In particular, I don't seeanythingcatching SIGCHLD but I'm willing to believe the the apr_ APIsare doing
this in a different way.

Also is there some kind of scoreboard telling which children are
available to recieve new requests? Because the server continuesto serverequests except for the missing children, leading me to believemod_wsgi hassomehow figured out that the dead children are not allowed tohandle new
requests.

Can you point me at the crash-recovery code?

Alec

On Jan 28, 2010, at 9:51 PM, Graham Dumpleton wrote:
2010/1/29 Alec Flett <[email protected]>:
On Jan 27, 2010, at 3:07 PM, Graham Dumpleton wrote:
Should restart on a crash automatically.
One cause of what you are seeing is Python threads beingdeadlocked
and over time causing available threads to be used up.

Are you using multithread daemons? Is your code and third party
modules thread safe?
nope, single-threaded! threads=1 on the WSGIDaemonProcess line.
Try setting 'inactivity-timeout=120' as option toWSGIDaemonProcess.
great, that seems like a good idea anyway.
I would also suggest setting LogLevel to 'info' so thatadditional
information printed out in error logs about process restarts.
That was going to be my next question ...:)
This way you might get an idea what request threads areactually
doing.
So none of this explains the "missing daemons" problem - wherethe
daemons
are not actually starting back up again... as you can seebelow, I set
the
display-name so that I can look at the daemons with "ps" -when I do a
ps ax
| grep <group> I only see a few processes
The extra level of logging may show if processes are doing somesort
of shutdown. If they are crashing, then you should already see
segmentation fault messages in main Apache error log, not virtual
host, so make sure you check both logs.
The processes should be restarted if they truly exit or crash.If itis an order process restart due to maximum requests or WSGIscript
file being touched, there is also a fail safe which defaults to 5
seconds. If it doesn't die in that time a thread should causeit tokill itself. The only way this would work in that way is ifsome Cextension module for Python had registered a competing C codelevelsignal handler or blocked signals and it interfered withmod_wsgi. Inthat case though the process would still exist and you shouldstill
see it.

If it was an Apache restart that triggered process restart, you
presumably would have known about that unless you have someautomatedsystem which does that. Even so, Apache will kill any daemonprocess
off which don't shut down in 3 seconds.
Can't also be case that processes are zombies, because thatwould mean
Apache isn't doing wait on their exit code, which it should be.

So, all quite confusing.
(in fact one of my servers in
production has dropped from the original 24 process, down to 7
yesterday,
and now only at 3 today!)
Unless you have long lived requests, 24 process is actuallyquite a
lot. Any well tuned system should manage with a lot less.
Even with that number of processes, since not multithreaded,unlessyou have a problem in your code with not releasing filedescriptors,wouldn't expect to run out of resources. You might though uselsof orofiles or other tool to work out if large number of filedescriptorsin use. Even then, if Apache/mod_wsgi can't restart processesbecause
of that, you should see error messages in main Apache error log.

Graham
Let me know what you find and also post your actual daemon mode
configuration.
Here's one of them:

#############################
# Project: client
##############################

WSGIDaemonProcess client-freebase.com processes=24 threads=1
display-name=%{GROU
P}
python-path=/mw/app/client_88277/_install/lib/python2.6/site-packages
maximum
-requests=1000

WSGIScriptAlias / /mw/app/client_88277/_install/bin/client.wsgi

# Server configuration for client
<Directory /mw/app/client_88277/_install/bin>
WSGIProcessGroup client-freebase.com
</Directory>
Graham

--
You received this message because you are subscribed to theGoogle
Groups
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/modwsgi?hl=en.
--
You received this message because you are subscribed to theGoogle
Groups
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/modwsgi?hl=en.
--
You received this message because you are subscribed to theGoogle
Groups "modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/modwsgi?hl=en.
--
You received this message because you are subscribed to theGoogle Groups
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/modwsgi?hl=en.
--
You received this message because you are subscribed to theGoogle Groups
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/modwsgi?hl=en.
--
You received this message because you are subscribed to the GoogleGroups
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/modwsgi?hl=en.
--
You received this message because you are subscribed to the GoogleGroups "modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected].For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.


--
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Re: [modwsgi] diagnosing "missing" daemons

Reply via email to