[Oops... just realised I have been replying to Michael, not the list. Very sorry.]
I had a closer look at the trace and the ft2font code, and I'm still none the wiser. It's clear what's happening, I just have no idea why. I have tried to trigger it in a test-case by calling imp.load_dynamic on ft2font repeatedly, and using both threads and processes (which I didn't think would work, but I was clutching at straws) as well, but still no joy. I had a brief look at the python source too (importdl.c and import.c), which does cache the module objects but admittedly doesn't do locking that I can see. This has all been working on the assumption that it is indeed a race condition of some sort, and if that were the case I'm _also_ unsure where that could arise from. We are now running on a 24 core machine (up from 8 on the previous server which had no problems), but my understanding of what memory is shared where in which apache/mod_python/wsgi configuration is too fuzzy to make sense of that possibility. (also, recall that the stack trace was captured from apache in single-threaded debug mode!) My current plan to fix it is to push the offending imports from module top-level down into the functions where they are required, but even assuming that is successful I would dearly love closure on this! /Mark. On 19 May 2011 12:24, Mark Hepburn <mark.hepb...@gmail.com> wrote: > I spoke too soon, I hit one! (I am unreasonably excited by this at this > stage). It looks like it's the same issue; it's in FT2Image and arises from > check_unique_method_name -- I'm about to look through the source, but it > seems a likely candidate. > The output of both bt and bt full is attached. > Thanks once again. > /Mark. > > On 19 May 2011 12:15, Mark Hepburn <mark.hepb...@gmail.com> wrote: >> >> Hi, thanks for the reply. >> I haven't managed to extract one yet; any hints? I've tried a few times >> with "gdb httpd" -> run -X, but unsuccessfully so far. My understanding is >> that this runs apache in single-threaded mode, and if it is a threading >> problem it is unlikely to reproduce the problem (I think). (The other >> complicating factor is that this is the only server it has been a problem >> on... which is also the production server, so I've been loathe to slow it >> down too much like this. Biting the bullet now, though..) >> There's no stack trace in the apache error log either; in fact there's not >> even a time-stamp when it crashes, just the message from the subject. >> Thanks again, Mark. >> On 19 May 2011 00:55, Michael Droettboom <md...@stsci.edu> wrote: > Can you provide a stack trace -- either a Python one, or a gdb one? > > Mike > > On 05/18/2011 03:25 AM, Mark Hepburn wrote: >> Hi, >> >> I have a web application using matplotlib which is unpredictably >> crashing with the error message from the subject. It seems to be >> happening in ft2font, but I can't be certain at this stage that it's >> only occurring there (although since isolating it via logging >> statements, every time it has occurred has been in that spot). The >> crash occurs at load time, seemingly through a chain of import >> statements (starting with wsgi app -> django -> my app): >> matplotlib.colorbar -> matplotlib.lines -> matplotlib.font_manager -> >> matplotlib.ft2font >> >> Google is strangely quiet on that particular message; the closest I >> have found that also involves ft2font was this rather old one: >> http://comments.gmane.org/gmane.comp.python.matplotlib.devel/1332 >> >> The unpredictable nature of it suggests that it's thread-related, but >> other than that I have no further clues. The unpredictable nature of >> the crashes obviously makes testing any theory or avenue quite slow at >> times! Does anyone have any suggestions, hints for further >> probing,... anything, please? >> >> The particulars: >> Server OS: openSUSE 11.3 (x86_64) >> matplotlib: 1.0.0 (compiled from source distro) >> Server: apache prefork, mod_wsgi >> Python version: 2.6.4 >> >> Extra factors: >> There are two versions of the application, deployed in virtualenvs >> (identical matplotlib versions). It does affect both of them, >> although I've only been investigating with one. It frequently seems >> to affect a group of processes; that is, reloading is required >> multiple times before it returns to normal. >> >> mod_wsgi is running in embedded mode, but the same problem was >> occurring with mod_python -- that was my main impetus for porting to >> wsgi in fact. The same application ran fine on the previous server >> however (SUSE Linux Enterprise Server 11 (x86_64)), in fact with 3 >> versions of the application, using mod_python. It was previously >> using matplotlib 0.98.5.2; according to my commit message the upgrade >> was prompted by the server move and that version not compiling against >> libpng1.4 on the new server. >> >> Thanks, Mark. >> >> > > > ------------------------------------------------------------------------------ > What Every C/C++ and Fortran developer Should Know! > Read this article and learn how Intel has extended the reach of its > next-generation tools to help Windows* and Linux* C/C++ and Fortran > developers boost performance applications - including clusters. > http://p.sf.net/sfu/intel-dev2devmay > _______________________________________________ > Matplotlib-users mailing list > Matplotlib-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > -- Where the hell is Mark: http://blog.everythingtastesbetterwithchilli.com/ ------------------------------------------------------------------------------ What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay _______________________________________________ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users