[Oops... just realised I have been replying to Michael, not the list.
Very sorry.]

I had a closer look at the trace and the ft2font code, and I'm still
none the wiser.  It's clear what's happening, I just have no idea why.
 I have tried to trigger it in a test-case by calling imp.load_dynamic
on ft2font repeatedly, and using both threads and processes (which I
didn't think would work, but I was clutching at straws) as well, but
still no joy.

I had a brief look at the python source too (importdl.c and import.c),
which does cache the module objects but admittedly doesn't do locking
that I can see.

This has all been working on the assumption that it is indeed a race
condition of some sort, and if that were the case I'm _also_ unsure
where that could arise from.  We are now running on a 24 core machine
(up from 8 on the previous server which had no problems), but my
understanding of what memory is shared where in which
apache/mod_python/wsgi configuration is too fuzzy to make sense of
that possibility.  (also, recall that the stack trace was captured
from apache in single-threaded debug mode!)

My current plan to fix it is to push the offending imports from module
top-level down into the functions where they are required, but even
assuming that is successful I would dearly love closure on this!

/Mark.


On 19 May 2011 12:24, Mark Hepburn <mark.hepb...@gmail.com> wrote:
> I spoke too soon, I hit one!  (I am unreasonably excited by this at this
> stage).  It looks like it's the same issue; it's in FT2Image and arises from
> check_unique_method_name -- I'm about to look through the source, but it
> seems a likely candidate.
> The output of both bt and bt full is attached.
> Thanks once again.
> /Mark.
>
> On 19 May 2011 12:15, Mark Hepburn <mark.hepb...@gmail.com> wrote:
>>
>> Hi, thanks for the reply.
>> I haven't managed to extract one yet; any hints?  I've tried a few times
>> with "gdb httpd" -> run -X, but unsuccessfully so far.  My understanding is
>> that this runs apache in single-threaded mode, and if it is a threading
>> problem it is unlikely to reproduce the problem (I think).  (The other
>> complicating factor is that this is the only server it has been a problem
>> on... which is also the production server, so I've been loathe to slow it
>> down too much like this.  Biting the bullet now, though..)
>> There's no stack trace in the apache error log either; in fact there's not
>> even a time-stamp when it crashes, just the message from the subject.
>> Thanks again, Mark.
>>


On 19 May 2011 00:55, Michael Droettboom <md...@stsci.edu> wrote:
> Can you provide a stack trace -- either a Python one, or a gdb one?
>
> Mike
>
> On 05/18/2011 03:25 AM, Mark Hepburn wrote:
>> Hi,
>>
>> I have a web application using matplotlib which is unpredictably
>> crashing with the error message from the subject.  It seems to be
>> happening in ft2font, but I can't be certain at this stage that it's
>> only occurring there (although since isolating it via logging
>> statements, every time it has occurred has been in that spot).  The
>> crash occurs at load time, seemingly through a chain of import
>> statements (starting with wsgi app ->  django ->  my app):
>> matplotlib.colorbar ->  matplotlib.lines ->  matplotlib.font_manager ->
>> matplotlib.ft2font
>>
>> Google is strangely quiet on that particular message; the closest I
>> have found that also involves ft2font was this rather old one:
>> http://comments.gmane.org/gmane.comp.python.matplotlib.devel/1332
>>
>> The unpredictable nature of it suggests that it's thread-related, but
>> other than that I have no further clues.  The unpredictable nature of
>> the crashes obviously makes testing any theory or avenue quite slow at
>> times!  Does anyone have any suggestions, hints for further
>> probing,... anything, please?
>>
>> The particulars:
>> Server OS: openSUSE 11.3 (x86_64)
>> matplotlib: 1.0.0 (compiled from source distro)
>> Server: apache prefork, mod_wsgi
>> Python version: 2.6.4
>>
>> Extra factors:
>> There are two versions of the application, deployed in virtualenvs
>> (identical matplotlib versions).  It does affect both of them,
>> although I've only been investigating with one.  It frequently seems
>> to affect a group of processes; that is, reloading is required
>> multiple times before it returns to normal.
>>
>> mod_wsgi is running in embedded mode, but the same problem was
>> occurring with mod_python -- that was my main impetus for porting to
>> wsgi in fact.  The same application ran fine on the previous server
>> however (SUSE Linux Enterprise Server 11 (x86_64)), in fact with 3
>> versions of the application, using mod_python.  It was previously
>> using matplotlib 0.98.5.2; according to my commit message the upgrade
>> was prompted by the server move and that version not compiling against
>> libpng1.4 on the new server.
>>
>> Thanks, Mark.
>>
>>
>
>
> ------------------------------------------------------------------------------
> What Every C/C++ and Fortran developer Should Know!
> Read this article and learn how Intel has extended the reach of its
> next-generation tools to help Windows* and Linux* C/C++ and Fortran
> developers boost performance applications - including clusters.
> http://p.sf.net/sfu/intel-dev2devmay
> _______________________________________________
> Matplotlib-users mailing list
> Matplotlib-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>



-- 
Where the hell is Mark:
http://blog.everythingtastesbetterwithchilli.com/

------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to