Okay, I admit this is more a user query, but since it's a bit more technical that some I thought I'd ask where more of the developers hang out.
We've just migrated some 2.1.x-based sites from Websphere 5 on Solaris and Windows (where they were running quite happily) to Websphere 6 on Linux. And now, from time to time, we keep getting occassional FileNotFound exceptions stating "Too many open files", leading to the pages returning 500s. This isn't even under heavy load. The sites could have been sitting there untouched for a couple of hours (they're not live yet), but subsequently opening up all the home pages you'd get a significant proportion of them producing the error. A few minutes later you refresh those pages that gave an error and they come up fine... We've worked around it by using ulimit to increase the no. of available handles (per process, if I read the docs correctly) from the default 1024 to 8000, and haven't seen it happen again. So it looks like it is simply that from time to time we hit the limit on simultaneously open files, and by the time we try again a few minutes later some of the handles have been freed up again. The interesting thing (and the reason for my mail) is that at the time we're seeing the error, doing an lsof on that particular app server's process ID reveals hundreds of open file handles on cocoon.xconf So, my question is this - can anyone suggest why this configuration file keeps getting opened so much? It's almost as if every time a configurable sitemap component is called, it's opening the file again to check the default configuration, and not releasing the handle again until some time later (possibly once the response has all been sent). But given the file never changes from request to request (and I'd expect to restart the application if it ever did change anyway) I'd be astonished if it didn't just read & parse it once at startup and pass the resulting object in to all the components rather than having them all do it. Can anyone confirm the file should only be opened once? (assuming there's nothing else accessing it outside of Cocoon) And, if so, can anyone think of a reason we're getting these problems in this particular environment? I'm just wary that if we were to move a bunch more sites onto this infrastructure, we might start hitting some global limit instead. Here's hoping... Andy. -- http://pseudoq.sourceforge.net/ Open source java Sudoku application
