Re: [modwsgi] diagnosing "missing" daemons

Graham Dumpleton Mon, 08 Feb 2010 15:56:51 -0800

If you are not using embedded mode, ie., only using daemon mode, then
add the directive:


  WSGIRestrictEmbedded On

This will tell mod_wsgi not to bother to initialise the Python
interpreter in the Apache server child processes, given it will not be
required.

This presumes mod_wsgi 3.X is being used as 2.X behaves differently.

That should eliminate those messages and make it clearer what is going on.

I will explain more later when have the time to catch up on all my email.

Graham

On 9 February 2010 10:42, Alec Flett <[email protected]> wrote:
>
> Ok, I think I'm starting to get a handle on whats going on.
>
> For background, we run in prefork mode. We currently have:
> StartServers         5
> MinSpareServers      5
> MaxSpareServers     10
> ServerLimit         600
> MaxClients          600
> MaxRequestsPerChild  1000
>
> For mod_wsgi I've got maximum-requests=1000
>
> For a bunch of PIDs, these are the mod_wsgi log messages I see:
> pids: 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 434,
> 473,
> Initializing Python.
> Attach interpreter ''.
> Destroying interpreters.
> Cleanup interpreter ''.
> Terminating Python.
> Python has shutdown.
>
> Now I did some exploring and it turns out those PIDs are apache children,
> NOT mod_wsgi daemons.
>
> I think that apache is quietly shutting down apache children, perhaps when
> they reach MaxRequestsPerChild, and this is taking the mod_wsgi children
> down with them, and mod_wsgi is not restarting those children. Could there
> possibly be some off-by-one bug where if we're on the 1000th request,
> mod_wsgi thinks "kill this child, and restart it" but then apache comes in
> and kills the child just before it starts?
>
> Alec
>
> On Feb 8, 2010, at 11:38 AM, Alec Flett wrote:
>
>> Ok, I've now found wsgi_manage_process...
>>
>> FWIW I haven't been able to reproduce the crash by calling
>> os.kill(os.getcwd(), signal.SIGBUS) and frankly I'm not even sure how
>> specifically our children are crashing, if it's a SIGBUS or something else.
>> all I know is the state I find the appserver in and there's little to
>> nothing from the logs
>>
>> I'm going to keep digging...
>>
>> Alec
>>
>> On Feb 8, 2010, at 10:35 AM, Alec Flett wrote:
>>
>>> So I'm still seeing this problem - that our python processes are crashing
>>> for some reason (our problem, I'm sure) but mod_wsgi isn't restarting them.
>>>
>>> I just perused the mod_wsgi.c source and I don't see anything that would
>>> restart children if they crashed? In particular, I don't see anything
>>> catching SIGCHLD but I'm willing to believe the the apr_  APIs are doing
>>> this in a different way.
>>>
>>> Also is there some kind of scoreboard telling which children are
>>> available to recieve new requests? Because the server continues to serve
>>> requests except for the missing children, leading me to believe mod_wsgi has
>>> somehow figured out that the dead children are not allowed to handle new
>>> requests.
>>>
>>> Can you point me at the crash-recovery code?
>>>
>>> Alec
>>>
>>> On Jan 28, 2010, at 9:51 PM, Graham Dumpleton wrote:
>>>
>>>> 2010/1/29 Alec Flett <[email protected]>:
>>>>>
>>>>> On Jan 27, 2010, at 3:07 PM, Graham Dumpleton wrote:
>>>>>
>>>>>> Should restart on a crash automatically.
>>>>>>
>>>>>> One cause of what you are seeing is Python threads being deadlocked
>>>>>> and over time causing available threads to be used up.
>>>>>>
>>>>>> Are you using multithread daemons? Is your code and third party
>>>>>> modules thread safe?
>>>>>>
>>>>>
>>>>> nope, single-threaded! threads=1 on the WSGIDaemonProcess line.
>>>>>
>>>>>> Try setting 'inactivity-timeout=120' as option to WSGIDaemonProcess.
>>>>>>
>>>>>
>>>>> great, that seems like a good idea anyway.
>>>>>>
>>>>>> I would also suggest setting LogLevel to 'info' so that additional
>>>>>> information printed out in error logs about process restarts.
>>>>>>
>>>>> That was going to be my next question ...:)
>>>>>
>>>>>>
>>>>>> This way you might get an idea what request threads are actually
>>>>>> doing.
>>>>>>
>>>>> So none of this explains the "missing daemons" problem - where the
>>>>> daemons
>>>>> are not actually starting back up again... as you can see below, I set
>>>>> the
>>>>> display-name so that I can look at the daemons with "ps" - when I do a
>>>>> ps ax
>>>>> | grep <group> I only see a few processes
>>>>
>>>> The extra level of logging may show if processes are doing some sort
>>>> of shutdown. If they are crashing, then you should already see
>>>> segmentation fault messages in main Apache error log, not virtual
>>>> host, so make sure you check both logs.
>>>>
>>>> The processes should be restarted if they truly exit or crash. If it
>>>> is an order process restart due to maximum requests or WSGI script
>>>> file being touched, there is also a fail safe which defaults to 5
>>>> seconds. If it doesn't die in that time a thread should cause it to
>>>> kill itself. The only way this would work in that way is if some C
>>>> extension module for Python had registered a competing C code level
>>>> signal handler or blocked signals and it interfered with mod_wsgi. In
>>>> that case though the process would still exist and you should still
>>>> see it.
>>>>
>>>> If it was an Apache restart that triggered process restart, you
>>>> presumably would have known about that unless you have some automated
>>>> system which does that. Even so, Apache will kill any daemon process
>>>> off which don't shut down in 3 seconds.
>>>>
>>>> Can't also be case that processes are zombies, because that would mean
>>>> Apache isn't doing wait on their exit code, which it should be.
>>>>
>>>> So, all quite confusing.
>>>>
>>>>> (in fact one of my servers in
>>>>> production has dropped from the original 24 process, down to 7
>>>>> yesterday,
>>>>> and now only at 3 today!)
>>>>
>>>> Unless you have long lived requests, 24 process is actually quite a
>>>> lot. Any well tuned system should manage with a lot less.
>>>>
>>>> Even with that number of processes, since not multithreaded, unless
>>>> you have a problem in your code with not releasing file descriptors,
>>>> wouldn't expect to run out of resources. You might though use lsof or
>>>> ofiles or other tool to work out if large number of file descriptors
>>>> in use. Even then, if Apache/mod_wsgi can't restart processes because
>>>> of that, you should see error messages in main Apache error log.
>>>>
>>>> Graham
>>>>
>>>>>> Let me know what you find and also post your actual daemon mode
>>>>>> configuration.
>>>>>>
>>>>>
>>>>> Here's one of them:
>>>>>
>>>>> #############################
>>>>> # Project: client
>>>>> ##############################
>>>>>
>>>>> WSGIDaemonProcess client-freebase.com processes=24 threads=1
>>>>> display-name=%{GROU
>>>>> P}
>>>>> python-path=/mw/app/client_88277/_install/lib/python2.6/site-packages
>>>>> maximum
>>>>> -requests=1000
>>>>>
>>>>> WSGIScriptAlias / /mw/app/client_88277/_install/bin/client.wsgi
>>>>>
>>>>> # Server configuration for client
>>>>> <Directory /mw/app/client_88277/_install/bin>
>>>>> WSGIProcessGroup client-freebase.com
>>>>> </Directory>
>>>>>
>>>>>
>>>>>
>>>>>> Graham
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups
>>>>>> "modwsgi" group.
>>>>>> To post to this group, send email to [email protected].
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected].
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/modwsgi?hl=en.
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups
>>>>> "modwsgi" group.
>>>>> To post to this group, send email to [email protected].
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected].
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/modwsgi?hl=en.
>>>>>
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "modwsgi" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected].
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/modwsgi?hl=en.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "modwsgi" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/modwsgi?hl=en.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/modwsgi?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/modwsgi?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Re: [modwsgi] diagnosing "missing" daemons

Reply via email to