Re: [modwsgi] nginx + apache2 + modwsgi + django random freeze

Graham Dumpleton Mon, 21 Nov 2011 16:07:23 -0800

Here is an example to clarify now that am connected back to the real
world at last.


[Mon Oct 24 15:29:31 2011] [error] Traceback (most recent call last):
[Mon Oct 24 15:29:31 2011] [error]   File
"/Library/WebServer/Sites/mingus-1/lib/python2.6/site-packages/newrelic-0.5.57.0/newrelic/core/application.py",
line 211, in harvest
[Mon Oct 24 15:29:31 2011] [error]     connection, stats.metric_data())
[Mon Oct 24 15:29:31 2011] [error]   File
"/Library/WebServer/Sites/mingus-1/lib/python2.6/site-packages/newrelic-0.5.57.0/newrelic/core/remote.py",
line 122, in send_metric_data
[Mon Oct 24 15:29:31 2011] [error]     res =
self.invoke_remote(conn,"metric_data",True,self._agent_run_id,self._agent_run_id,self._metric_data_time,now,metric_data)
[Mon Oct 24 15:29:31 2011] [error]   File
"/Library/WebServer/Sites/mingus-1/lib/python2.6/site-packages/newrelic-0.5.57.0/newrelic/core/remote.py",
line 156, in invoke_remote
[Mon Oct 24 15:29:31 2011] [error]     return
self._remote.invoke_remote(connection, method, compress, agent_run_id,
*args)
[Mon Oct 24 15:29:31 2011] [error]   File
"/Library/WebServer/Sites/mingus-1/lib/python2.6/site-packages/newrelic-0.5.57.0/newrelic/core/remote.py",
line 270, in invoke_remote
[Mon Oct 24 15:29:31 2011] [error]     raise Exception("%s failed:
status code %i" % (method, response.status))
[Mon Oct 24 15:29:31 2011] [error] Exception: metric_data failed:
status code 503

So for normal Python traceback you get code snippet, eg:

  return self._remote.invoke_remote(connection, method, compress,
agent_run_id, *args)

I don't give that as can see issues with doing that.

Graham

On 22 November 2011 06:37, Graham Dumpleton <[email protected]> wrote:
> A python tracebck will normally show the actual python code on the line
> number in the file. Because the python code is not shown you will always
> need to go to the original code file and go to the line to work out what
> code was being executed. Often the display of the single line of python code
> from each executing stack frame in the traceback is enough to work it out,
> but no such luxury here am afraid.
>
> I would cut and paste example to explain but can't do that on my phone.
>
> Graham
>
> On Tuesday, 22 November 2011, Rodrigo Campos <[email protected]> wrote:
>> On Mon, Nov 21, 2011 at 3:07 AM, Graham Dumpleton
>> <[email protected]> wrote:
>>> On 20 November 2011 14:08, Rodrigo Campos <[email protected]> wrote:
>>>>
>>>> Also, it seems there is something wrong with the date/hour I didn't
>>>> yet look. It didn't happen so much time between the "Script timed out"
>>>> and the killing. It seems some messages are logged using one time and
>>>> some others using other timezone/something like that. Because I have
>>>> this in the logs, for example:
>>>
>>> The timezone difference will be because one log message comes from
>>> Apache child process and the other from the mod_wsgi daemon process.
>>> If Django is overriding timezone then what is displayed will be
>>> different to what Apache child processes show.
>>
>> It's probably that, then. Thanks :-)
>>
>>>
>>>> Anyone has any idea/tip on how can I further debug this ?
>>>>
>>>> I've seen there are commits in WSGI (not included in any release yet)
>>>> that add a blocked timeout and blocker request parameters
>>>
>>> Yep. The inactivity-timeout has been serving double duty. It was
>>> really intended for when the process is completely idle with no
>>> requests arriving or being handled. In place of a better solution at
>>> the time, also made it so that it would detect case where no requests
>>> arriving but there were existing requests and for the total of those
>>> active requests there was no input or output. So, worked as a poor
>>> failsafe for when the process as a whole hung because of all requests
>>> blocking for some reason.
>>>
>>> In mod_wsgi 4.0 those concepts are properly separated. The
>>> inactivity-timeout is only for case where process is completely idle,
>>> no active requests at all.
>>
>> Is there any estimated date when mod_wsgi 4.0 will be released ?
>> Is it too risky to try to compile todays trunk and use it in
>> production (previous sevral days on testing) ? Are there any "blocker"
>> issue you are aware of ?
>>
>>>
>>> The new options for the busy case are blocked-requests and
>>> blocked-timeout. By default the blocked restart is not enabled. It can
>>> be enabled by setting blocked-timeout in much the same was as
>>> inactivity-timeout option. When blocked-requests is not set, it
>>> defaults to what ever the number of threads is for the mod_wsgi daemon
>>> process.
>>>
>>> What happens is that when the blocked-timeout is defined, when the
>>> number of concurrent requests is equal to or exceeds blocked-requests,
>>> if there is no input or output for any requests for period set by
>>> blocked-timeout, then the process will be restarted.
>>
>> Great, thanks!
>> That's why I asked for these parameters, I've read some similar
>> description on your post on stackoverflow and now saw commits on trunk
>> that seems to implement them. This is a great feature for us right
>> now, we are *just* needing it =)
>>
>>>
>>> So, if blocked-requests isn't set and defaults to total threads, then
>>> when whole process effectively hangs, then it will restart.
>>
>> Right, makes sense
>>
>>> I need to think about how this all works some more and see whether I
>>> can better determine if individual requests are producing no input and
>>> output and so likely blocked. I may yet be able to come up with a
>>> slightly better way of handling this and cause a restart sooner as
>>> soon as no other active requests in addition to the blocked ones. That
>>> way will not have situation where in busy site will still end up
>>> reaching maximum number of threads being blocked before forced
>>> restart.
>>>
>>>> and an other
>>>> one to get some information of the processes/threads (which
>>>> information ?
>>>
>>> This is some experimental stuff that works in conjunction with New
>>> Relic agent to track number of concurrent requests and thread
>>> utilisation. It can be graphed in New Relic using custom views. I am
>>> not entirely happy with interface into mod_wsgi for this at this point
>>> and so will be changing how this works.
>>>
>>>> a call-trace perhaps ? :-)).
>>>
>>> I have just committed some changes that adds a feature where by if a
>>> restart is performed due to blocked requests then it will dump out
>>> what any active requests in the process are doing. You would therefore
>>> see in the Apache logs:
>>
>> Great, thanks a lot!
>>
>>>
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Daemon process
>>> busy inactivity timer expired, stopping process 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Shutdown
>>> requested 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Dumping stack
>>> trace for active Python threads.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Thread
>>> 4320296960 executing file
>>> "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 8, in
>>> function
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): called from
>>> file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 12,
>>> in application.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Thread
>>> 4320833536 executing file
>>> "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 8, in
>>> function
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): called from
>>> file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 12,
>>> in application.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Thread
>>> 4321370112 executing file
>>> "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 8, in
>>> function
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): called from
>>> file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 12,
>>> in application.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Thread
>>> 4313858048 executing file
>>> "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 8, in
>>> function
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): called from
>>> file "/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi", line 12,
>>> in application.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread
>>> 10 in daemon process 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread
>>> 9 in daemon process 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread
>>> 5 in daemon process 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread
>>> 8 in daemon process 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread
>>> 7 in daemon process 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread
>>> 6 in daemon process 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread
>>> 4 in daemon process 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread
>>> 3 in daemon process 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread
>>> 2 in daemon process 'hello-1'.
>>> [Mon Nov 21 15:18:52 2011] [info] mod_wsgi (pid=12256): Exiting thread
>>> 1 in daemon process 'hello-1'.
>>> [Mon Nov 21 15:18:57 2011] [info] mod_wsgi (pid=12256): Aborting
>>> process 'hello-1'.
>>> [Mon Nov 21 15:18:57 2011] [error] [client 127.0.0.1] Premature end of
>>> script headers: environ.wsgi
>>> [Mon Nov 21 15:18:57 2011] [error] [client 127.0.0.1] Premature end of
>>> script headers: environ.wsgi
>>> [Mon Nov 21 15:18:57 2011] [error] [client 127.0.0.1] Premature end of
>>> script headers: environ.wsgi
>>> [Mon Nov 21 15:18:57 2011] [error] [client 127.0.0.1] Premature end of
>>> script headers: environ.wsgi
>>>
>>> It does require LogLevel in Apache to be set to 'info' though. There
>>> is only file name, line number and function name context. It isn't
>>> really feasible to give code line like normal Python tracebacks as
>>> this is reaching from underneath all interpreters and written in C
>>> code. It is happening on shutdown and also has to avoid doing stuff
>>> that might result in GIL being released and other code being run. So,
>>> has to be simple as possible. Should be sufficient though in helping
>>> to target trouble spot.
>>
>> Sorry, I'm not sure I followed you here. The trace gives file name,
>> line number and function name context but the line number is not
>> feasible ? Or what is not feasible ? Sorry, I lost you :S
>>
>>>
>>>> If I understand correctly
>>>> this could be *very* helpful to debug this problem. Also, if we could
>>>> have a trace where the blocked process where hanging it will be *very
>>>> very* useful :-D
>>>>
>>>> Is there some other configuration option that could help ? Do you know
>>>> when will be a new release that includes the "blocked request"
>>>> parameters ?
>>>
>>> Above is all I can think of right now in rush before have to go back
>>> home to no proper internet.
>>
>> That's enough :-)
>>
>>
>>
>>
>>
>>
>> Thanks a lot!
>>
>> Rodrigo
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/modwsgi?hl=en.
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Re: [modwsgi] nginx + apache2 + modwsgi + django random freeze

Reply via email to