Re: [modwsgi] diagnosing "missing" daemons

Graham Dumpleton Tue, 09 Feb 2010 16:39:58 -0800

BTW, do you have:

  LogLevel info


(or debug) set in Apache configuration.

If you do, then the main Apache error log should be logging when
daemon processes are stopped and started. For example:

[Wed Feb 10 11:18:29 2010] [info] mod_wsgi (pid=1563): Starting
process 'tests' with uid=501, gid=20 and threads=15.

For a case where the daemon process is explicitly killed will see:

[Wed Feb 10 11:18:44 2010] [info] mod_wsgi (pid=1563): Process 'tests'
has died, restarting.
[Wed Feb 10 11:18:44 2010] [info] mod_wsgi (pid=1568): Starting
process 'tests' with uid=501, gid=20 and threads=15.

Because this was by SIGINT and so a graceful shutdown, in the virtual
host specific error log, or main error log if no virtual host error
log, you will see:

[Wed Feb 10 11:18:44 2010] [info] mod_wsgi (pid=1563): Shutdown
requested 'tests'.
[Wed Feb 10 11:18:44 2010] [info] mod_wsgi (pid=1563): Stopping process 'tests'.
[Wed Feb 10 11:18:44 2010] [info] mod_wsgi (pid=1563): Destroying interpreters.
[Wed Feb 10 11:18:44 2010] [info] mod_wsgi (pid=1563): Cleanup interpreter ''.
[Wed Feb 10 11:18:44 2010] [info] mod_wsgi (pid=1563): Terminating Python.
[Wed Feb 10 11:18:44 2010] [info] mod_wsgi (pid=1563): Python has shutdown.

If the daemon process had just crashed, you wouldn't see the later messages.

If it crashed because of segmentation fault, then the main Apache
error log will show the segmentation fault message.

There will be different shutdown messages if something like maximum
requests or inactivity timeout is defined for daemon process and it is
triggered. I would supply examples, but for whatever reason in my
current code base, inactivity timeout doesn't seem to be working. This
may or may not be related and am investigating.

BTW, setting:

  LogLevel debug
  WSGIVerboseDebugging On

will give you even more debug messages, but latter causes logging for
every request which may not be what you want as will fill the logs up.

Graham



On 10 February 2010 04:12, Alec Flett <[email protected]> wrote:
>
> On Feb 8, 2010, at 5:25 PM, Graham Dumpleton wrote:
>
>> I should add that you should also read:
>>
>>
>>  http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
>>
>
> I have read this a few times... I feel fairly enlightened, but it still
> doesn't explain how I'm losing daemons.
>
>>
>> Because by default the Python interpreter still gets created in those
>> processes, is probably why you are getting confused. That is, those
>> processes wouldn't be replaced straight away, unlike the daemon mode
>> processes which would be as they are part of a static pool size,
>> whereas main Apache server child processes are effectively part of a
>> dynamic pool size.
>>
>
> But I guess what I'm seeing is that the daemons really are failing to
> recirculate, or something - I'm still in this world where we're suddenly at
> 2-3 daemons, when we started with 24. (I should add that we chose 24 because
> we do have big beefy machines with lots of RAM and cores, plus this
> particular application tends to be CPU-heavy)
>
>> Thus, setting WSGIRestrictEmbedded and disabling default behaviour
>> that sees Python interpreter still initialised in those processes may
>> clear things up.
>>
>
> I'll definitely give that a try... at least it should further reduce the
> number of log messages that may be confusing me.
>
> Alec
>
>> Graham
>>
>> On 9 February 2010 10:56, Graham Dumpleton <[email protected]>
>> wrote:
>>>
>>> If you are not using embedded mode, ie., only using daemon mode, then
>>> add the directive:
>>>
>>>  WSGIRestrictEmbedded On
>>>
>>> This will tell mod_wsgi not to bother to initialise the Python
>>> interpreter in the Apache server child processes, given it will not be
>>> required.
>>>
>>> This presumes mod_wsgi 3.X is being used as 2.X behaves differently.
>>>
>>> That should eliminate those messages and make it clearer what is going
>>> on.
>>>
>>> I will explain more later when have the time to catch up on all my email.
>>>
>>> Graham
>>>
>>> On 9 February 2010 10:42, Alec Flett <[email protected]> wrote:
>>>>
>>>> Ok, I think I'm starting to get a handle on whats going on.
>>>>
>>>> For background, we run in prefork mode. We currently have:
>>>> StartServers         5
>>>> MinSpareServers      5
>>>> MaxSpareServers     10
>>>> ServerLimit         600
>>>> MaxClients          600
>>>> MaxRequestsPerChild  1000
>>>>
>>>> For mod_wsgi I've got maximum-requests=1000
>>>>
>>>> For a bunch of PIDs, these are the mod_wsgi log messages I see:
>>>> pids: 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425,
>>>> 434,
>>>> 473,
>>>> Initializing Python.
>>>> Attach interpreter ''.
>>>> Destroying interpreters.
>>>> Cleanup interpreter ''.
>>>> Terminating Python.
>>>> Python has shutdown.
>>>>
>>>> Now I did some exploring and it turns out those PIDs are apache
>>>> children,
>>>> NOT mod_wsgi daemons.
>>>>
>>>> I think that apache is quietly shutting down apache children, perhaps
>>>> when
>>>> they reach MaxRequestsPerChild, and this is taking the mod_wsgi children
>>>> down with them, and mod_wsgi is not restarting those children. Could
>>>> there
>>>> possibly be some off-by-one bug where if we're on the 1000th request,
>>>> mod_wsgi thinks "kill this child, and restart it" but then apache comes
>>>> in
>>>> and kills the child just before it starts?
>>>>
>>>> Alec
>>>>
>>>> On Feb 8, 2010, at 11:38 AM, Alec Flett wrote:
>>>>
>>>>> Ok, I've now found wsgi_manage_process...
>>>>>
>>>>> FWIW I haven't been able to reproduce the crash by calling
>>>>> os.kill(os.getcwd(), signal.SIGBUS) and frankly I'm not even sure how
>>>>> specifically our children are crashing, if it's a SIGBUS or something
>>>>> else.
>>>>> all I know is the state I find the appserver in and there's little to
>>>>> nothing from the logs
>>>>>
>>>>> I'm going to keep digging...
>>>>>
>>>>> Alec
>>>>>
>>>>> On Feb 8, 2010, at 10:35 AM, Alec Flett wrote:
>>>>>
>>>>>> So I'm still seeing this problem - that our python processes are
>>>>>> crashing
>>>>>> for some reason (our problem, I'm sure) but mod_wsgi isn't restarting
>>>>>> them.
>>>>>>
>>>>>> I just perused the mod_wsgi.c source and I don't see anything that
>>>>>> would
>>>>>> restart children if they crashed? In particular, I don't see anything
>>>>>> catching SIGCHLD but I'm willing to believe the the apr_  APIs are
>>>>>> doing
>>>>>> this in a different way.
>>>>>>
>>>>>> Also is there some kind of scoreboard telling which children are
>>>>>> available to recieve new requests? Because the server continues to
>>>>>> serve
>>>>>> requests except for the missing children, leading me to believe
>>>>>> mod_wsgi has
>>>>>> somehow figured out that the dead children are not allowed to handle
>>>>>> new
>>>>>> requests.
>>>>>>
>>>>>> Can you point me at the crash-recovery code?
>>>>>>
>>>>>> Alec
>>>>>>
>>>>>> On Jan 28, 2010, at 9:51 PM, Graham Dumpleton wrote:
>>>>>>
>>>>>>> 2010/1/29 Alec Flett <[email protected]>:
>>>>>>>>
>>>>>>>> On Jan 27, 2010, at 3:07 PM, Graham Dumpleton wrote:
>>>>>>>>
>>>>>>>>> Should restart on a crash automatically.
>>>>>>>>>
>>>>>>>>> One cause of what you are seeing is Python threads being deadlocked
>>>>>>>>> and over time causing available threads to be used up.
>>>>>>>>>
>>>>>>>>> Are you using multithread daemons? Is your code and third party
>>>>>>>>> modules thread safe?
>>>>>>>>>
>>>>>>>>
>>>>>>>> nope, single-threaded! threads=1 on the WSGIDaemonProcess line.
>>>>>>>>
>>>>>>>>> Try setting 'inactivity-timeout=120' as option to
>>>>>>>>> WSGIDaemonProcess.
>>>>>>>>>
>>>>>>>>
>>>>>>>> great, that seems like a good idea anyway.
>>>>>>>>>
>>>>>>>>> I would also suggest setting LogLevel to 'info' so that additional
>>>>>>>>> information printed out in error logs about process restarts.
>>>>>>>>>
>>>>>>>> That was going to be my next question ...:)
>>>>>>>>
>>>>>>>>>
>>>>>>>>> This way you might get an idea what request threads are actually
>>>>>>>>> doing.
>>>>>>>>>
>>>>>>>> So none of this explains the "missing daemons" problem - where the
>>>>>>>> daemons
>>>>>>>> are not actually starting back up again... as you can see below, I
>>>>>>>> set
>>>>>>>> the
>>>>>>>> display-name so that I can look at the daemons with "ps" - when I do
>>>>>>>> a
>>>>>>>> ps ax
>>>>>>>> | grep <group> I only see a few processes
>>>>>>>
>>>>>>> The extra level of logging may show if processes are doing some sort
>>>>>>> of shutdown. If they are crashing, then you should already see
>>>>>>> segmentation fault messages in main Apache error log, not virtual
>>>>>>> host, so make sure you check both logs.
>>>>>>>
>>>>>>> The processes should be restarted if they truly exit or crash. If it
>>>>>>> is an order process restart due to maximum requests or WSGI script
>>>>>>> file being touched, there is also a fail safe which defaults to 5
>>>>>>> seconds. If it doesn't die in that time a thread should cause it to
>>>>>>> kill itself. The only way this would work in that way is if some C
>>>>>>> extension module for Python had registered a competing C code level
>>>>>>> signal handler or blocked signals and it interfered with mod_wsgi. In
>>>>>>> that case though the process would still exist and you should still
>>>>>>> see it.
>>>>>>>
>>>>>>> If it was an Apache restart that triggered process restart, you
>>>>>>> presumably would have known about that unless you have some automated
>>>>>>> system which does that. Even so, Apache will kill any daemon process
>>>>>>> off which don't shut down in 3 seconds.
>>>>>>>
>>>>>>> Can't also be case that processes are zombies, because that would
>>>>>>> mean
>>>>>>> Apache isn't doing wait on their exit code, which it should be.
>>>>>>>
>>>>>>> So, all quite confusing.
>>>>>>>
>>>>>>>> (in fact one of my servers in
>>>>>>>> production has dropped from the original 24 process, down to 7
>>>>>>>> yesterday,
>>>>>>>> and now only at 3 today!)
>>>>>>>
>>>>>>> Unless you have long lived requests, 24 process is actually quite a
>>>>>>> lot. Any well tuned system should manage with a lot less.
>>>>>>>
>>>>>>> Even with that number of processes, since not multithreaded, unless
>>>>>>> you have a problem in your code with not releasing file descriptors,
>>>>>>> wouldn't expect to run out of resources. You might though use lsof or
>>>>>>> ofiles or other tool to work out if large number of file descriptors
>>>>>>> in use. Even then, if Apache/mod_wsgi can't restart processes because
>>>>>>> of that, you should see error messages in main Apache error log.
>>>>>>>
>>>>>>> Graham
>>>>>>>
>>>>>>>>> Let me know what you find and also post your actual daemon mode
>>>>>>>>> configuration.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Here's one of them:
>>>>>>>>
>>>>>>>> #############################
>>>>>>>> # Project: client
>>>>>>>> ##############################
>>>>>>>>
>>>>>>>> WSGIDaemonProcess client-freebase.com processes=24 threads=1
>>>>>>>> display-name=%{GROU
>>>>>>>> P}
>>>>>>>>
>>>>>>>> python-path=/mw/app/client_88277/_install/lib/python2.6/site-packages
>>>>>>>> maximum
>>>>>>>> -requests=1000
>>>>>>>>
>>>>>>>> WSGIScriptAlias / /mw/app/client_88277/_install/bin/client.wsgi
>>>>>>>>
>>>>>>>> # Server configuration for client
>>>>>>>> <Directory /mw/app/client_88277/_install/bin>
>>>>>>>> WSGIProcessGroup client-freebase.com
>>>>>>>> </Directory>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Graham
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups
>>>>>>>>> "modwsgi" group.
>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>> [email protected].
>>>>>>>>> For more options, visit this group at
>>>>>>>>> http://groups.google.com/group/modwsgi?hl=en.
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups
>>>>>>>> "modwsgi" group.
>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>> [email protected].
>>>>>>>> For more options, visit this group at
>>>>>>>> http://groups.google.com/group/modwsgi?hl=en.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "modwsgi" group.
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> [email protected].
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/group/modwsgi?hl=en.
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups
>>>>>> "modwsgi" group.
>>>>>> To post to this group, send email to [email protected].
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected].
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/modwsgi?hl=en.
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups
>>>>> "modwsgi" group.
>>>>> To post to this group, send email to [email protected].
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected].
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/modwsgi?hl=en.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups
>>>> "modwsgi" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected].
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/modwsgi?hl=en.
>>>>
>>>>
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/modwsgi?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/modwsgi?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Re: [modwsgi] diagnosing "missing" daemons

Reply via email to