2009/7/10 Mike McGrath <[email protected]>:
>
> On Tue, Jun 30, 2009 at 5:01 PM, Graham
> Dumpleton<[email protected]> wrote:
>>
>> 2009/6/30 Graham Dumpleton <[email protected]>:
>>> 2009/6/30 Ricky Zhou <[email protected]>:
>>>> Hey, I'm working with Mike on debugging these 500s.
>>>>
>>>> On 2009-06-30 08:40:52 AM, Graham Dumpleton wrote:
>>>>> Any chance you can upgrade to mod_wsgi 2.5. Various issues have been
>>>>> fixed between 2.1 and 2.5 and wouldn't rule out that problem has
>>>>> already been fixed.
>>>> Sure, we can try reproduce with with a mod_wsgi 2.5.  As a side
>>>> question, do you think the various bug fixes from 2.1 to 2.5 are
>>>> significant enough that the mod_wsgi package in the EPEL repository
>>>> should be updated?
>>>
>>> The main bug fixes are number 4 in:
>>>
>>>  http://code.google.com/p/modwsgi/wiki/ChangesInVersion0202
>>>
>>> Number 2, 4 and 5 in:
>>>
>>>  http://code.google.com/p/modwsgi/wiki/ChangesInVersion0204
>>>
>>> The wsgi.file_wrapper fixes may not be an issue if TG doesn't use it.
>>>
>>> The main one would be worried about is the memory corruption problem.
>>> This has been the source of a few obscure random problems.
>>>
>>>> We usually try to only update EPEL packages for
>>>> important bugfixes in order to keep those packages as stable as
>>>> possible, but I'd be happy to talk to the maintainer about it if it
>>>> should be updated.
>>>
>>> My 2.X versions are pretty well bug fixes only. Version 3.X is next
>>> major feature update.
>>>
>
> We are now at version mod_wsgi-2.5-1.el5 and have set
> WSGIApplicationGroup %{GLOBAL}.  We have not yet moved down to one
> thread but that is the next on our list.  We did get some error logs
> captured - http://mmcgrath.fedorapeople.org/error_logs.txt if it helps
> any.

That you are getting:

  [Fri Jul 10 00:04:25 2009] [notice] child pid 29021 exit signal Aborted (6)

is interesting.

A SIGABRT is not something that normally can just happen, at least not
in the same way as a SIGSEGV is caused by things like dereferencing a
bad pointer. A SIGABRT is normally only going to occur where abort()
is called explicitly. Thus, code must be calling that function, or
have assert() macros compiled into code and the assertion check is
failing and calling abort(). See manual pages for abort() and
assert().

At a guess I would say therefore that some C code being used directly
or indirectly has got assertion checks and one is failing.

Normally assert() would see details of the assertion being written to
stderr. Under Apache though, because of stderr being buffered, that
information may not be shown in Apache error logs if inside assert()
it is not explicitly flushing stderr after writing any information. In
other words, the details of the assertion may be getting lost. This
may not be the case though, and suggest you have a good look through
the error logs, including the main Apache error log if this is
actually for a virtual host with its own error log, for any line which
doesn't have the leading date/time stamp on it.

Only other concrete thing I can suggest right now, is if you do work
out how to replicate it, that you attach GDB to daemon process and
trigger it to get a stack trace out of GDB.

One long shot, is to write a little C extension module for Python and
load it, and which registers at C code level a signal handler for
SIGABRT and have that signal handler call fflush(stderr). Strictly
speaking shouldn't be doing that in a signal handler, but may be
enough to flush out the buffered information about the assertion
failure.

Graham

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to