On Wed, Jan 27, 2010 at 11:23:49AM +1100, Graham Dumpleton wrote:
> 2010/1/27 Giel van Schijndel:
>> On Tue, Jan 26, 2010 at 01:46:53PM +1100, Graham Dumpleton wrote:
>>> 2010/1/26 Giel van Schijndel:
>>>> On Tue, Jan 26, 2010 at 01:12:00AM +0100, Giel van Schijndel wrote:
>>>>> On Tue, Jan 26, 2010 at 11:00:42AM +1100, Graham Dumpleton wrote:
>>>>>> 2010/1/26 Graham Dumpleton:
>>>>>>> 2010/1/26 Giel van Schijndel:
>>>>>>>> Some time later, and now it seems that the daemon crashed. At least I
>>>>>>>> think it must have, considering that it didn't leave anything in the
>>>>>>>> logs, except for timeouts around when the daemon must have gone. No
>>>>>>>> coredump either.
>>>>>>>
>>>>>>> I'll give you an updated patch shortly then which includes the other
>>>>>>> changes I figured are required to make it more robust on platforms
>>>>>>> where conditional wait can actually return even though condition not
>>>>>>> satisfied.
>>>>>>
>>>>>> Revert that prior patch and try this one instead:
>>>>>
>>>>> No immediate regressions so far. I.e. it functions properly within a few
>>>>> minutes after restarting Apache with it.
>>>>
>>>> It seems that after the daemon times out due to inactivity it gets
>>>> killed, but it never seems to be respawned when a request arrives after
>>>> that timeout.
>>>
>>> Ensure LogLevel directive is 'info' and see what messages appear in
>>> error log, or if you have them already, from both main Apache error
>>> log and any virtual host log which daemon process is within, post them
>>> for me.
>>
>> Logs (last request + everything that follows):
>>> [Tue Jan 26 15:18:48 2010] [debug] mod_wsgi.c(12695): mod_wsgi (pid=16220): 
>>> Request server matched was 'ilpam.il.fontys.nl|0'.
>>> [Tue Jan 26 15:27:33 2010] [info] mod_wsgi (pid=16220): Daemon process 
>>> deadlock timer expired, stopping process 'ilpam'.
> 
> Hmmm, you were talking about inactivity timeouts, yet above is
> something completely different. The above specifically happens when
> something has locked Python GIL for period of deadlock timeout,
> default 300 seconds, and as such no other Python code can run.

Except that the above doesn't look like 300 seconds to me, more like
515.

I guess I could test this by setting 'threads=1'?

> I can't see from logs, as not enough from before this point, whether
> this occurred on inactivity timeout shutdown or whether distinct. My
> guess is it is distinct as for inactivity timeout, where shutdown
> timeout is 5 seconds by default, there is no way one could get to 300
> seconds and trigger this.
> 
> Can you explain more about how often/when you are seeing this deadlock
> timeout in error logs. Something like that is not good as indicates a
> badly implemented Python C extension module that doesn't release GIL
> when doing blocking operations.

I'm not using any Python C extensions, not directly at least. The only
things (aside from the Python standard library) I use are CherryPy
(3.1.2) and Genshi (0.5.1). It's a rather simple project up until now
[1], currently with as only code a 125 line long Python script [2].

As for how often I see this deadlock. I thought it to be related to the
inactivity timeout because it appears to happen about 500 (+/- 120)
seconds after the last HTTP request gets in.

[1] http://git.il.fontys.nl/git/ilpam.git
[2] http://git.il.fontys.nl/git/ilpam.git/blob/HEAD:/mgt/web/wsgi.py

-- 
With kind regards,
Giel van Schijndel
- Interlink <www.il.fontys.nl>

Attachment: signature.asc
Description: Digital signature

Reply via email to