Ok, I've now found wsgi_manage_process...

FWIW I haven't been able to reproduce the crash by calling os.kill(os.getcwd(), signal.SIGBUS) and frankly I'm not even sure how specifically our children are crashing, if it's a SIGBUS or something else. all I know is the state I find the appserver in and there's little to nothing from the logs

I'm going to keep digging...

Alec

On Feb 8, 2010, at 10:35 AM, Alec Flett wrote:

So I'm still seeing this problem - that our python processes are crashing for some reason (our problem, I'm sure) but mod_wsgi isn't restarting them.

I just perused the mod_wsgi.c source and I don't see anything that would restart children if they crashed? In particular, I don't see anything catching SIGCHLD but I'm willing to believe the the apr_ APIs are doing this in a different way.

Also is there some kind of scoreboard telling which children are available to recieve new requests? Because the server continues to serve requests except for the missing children, leading me to believe mod_wsgi has somehow figured out that the dead children are not allowed to handle new requests.

Can you point me at the crash-recovery code?

Alec

On Jan 28, 2010, at 9:51 PM, Graham Dumpleton wrote:

2010/1/29 Alec Flett <[email protected]>:

On Jan 27, 2010, at 3:07 PM, Graham Dumpleton wrote:

Should restart on a crash automatically.

One cause of what you are seeing is Python threads being deadlocked
and over time causing available threads to be used up.

Are you using multithread daemons? Is your code and third party
modules thread safe?


nope, single-threaded! threads=1 on the WSGIDaemonProcess line.

Try setting 'inactivity-timeout=120' as option to WSGIDaemonProcess.


great, that seems like a good idea anyway.

I would also suggest setting LogLevel to 'info' so that additional
information printed out in error logs about process restarts.

That was going to be my next question ...:)


This way you might get an idea what request threads are actually doing.

So none of this explains the "missing daemons" problem - where the daemons are not actually starting back up again... as you can see below, I set the display-name so that I can look at the daemons with "ps" - when I do a ps ax
| grep <group> I only see a few processes

The extra level of logging may show if processes are doing some sort
of shutdown. If they are crashing, then you should already see
segmentation fault messages in main Apache error log, not virtual
host, so make sure you check both logs.

The processes should be restarted if they truly exit or crash. If it
is an order process restart due to maximum requests or WSGI script
file being touched, there is also a fail safe which defaults to 5
seconds. If it doesn't die in that time a thread should cause it to
kill itself. The only way this would work in that way is if some C
extension module for Python had registered a competing C code level
signal handler or blocked signals and it interfered with mod_wsgi. In
that case though the process would still exist and you should still
see it.

If it was an Apache restart that triggered process restart, you
presumably would have known about that unless you have some automated
system which does that. Even so, Apache will kill any daemon process
off which don't shut down in 3 seconds.

Can't also be case that processes are zombies, because that would mean
Apache isn't doing wait on their exit code, which it should be.

So, all quite confusing.

(in fact one of my servers in
production has dropped from the original 24 process, down to 7 yesterday,
and now only at 3 today!)

Unless you have long lived requests, 24 process is actually quite a
lot. Any well tuned system should manage with a lot less.

Even with that number of processes, since not multithreaded, unless
you have a problem in your code with not releasing file descriptors,
wouldn't expect to run out of resources. You might though use lsof or
ofiles or other tool to work out if large number of file descriptors
in use. Even then, if Apache/mod_wsgi can't restart processes because
of that, you should see error messages in main Apache error log.

Graham

Let me know what you find and also post your actual daemon mode
configuration.


Here's one of them:

#############################
# Project: client
##############################

WSGIDaemonProcess client-freebase.com processes=24 threads=1
display-name=%{GROU
P} python-path=/mw/app/client_88277/_install/lib/python2.6/site- packages
maximum
-requests=1000

WSGIScriptAlias / /mw/app/client_88277/_install/bin/client.wsgi

# Server configuration for client
<Directory /mw/app/client_88277/_install/bin>
WSGIProcessGroup client-freebase.com
</Directory>



Graham

--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/modwsgi?hl=en.


--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/modwsgi?hl=en.



--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected] . For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en .


--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected] . For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en .


--
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Reply via email to