I don't really have anything useful to contribute except for some thoughts about the solutions presented.
First I don't view this as too much of a GeoServer flaw, but more just a limitation of the platform it runs on. I don't think its unreasonable to expect people running a server that is going to be supporting access from a tiled client to run with the appropriate amount of memory. I recently saw these same issues and just upped the memory to 256m (did not try with 128m) and saw no more issues running the styler for a few hours. That said if it is agreed that this should be addressed I guess my preferable solution would be the rendering queue idea as I see it fitting in nicely with the longer term goal of having GeoServer support the idea of jobs and a job queue that can be tweaked by the server admin, as well as provide real time job information. However I see container specific plugins as an interesting path as well. And it has the nice effect of not having to touch the core (besides having to perhaps add an extension point here or there). I see either solution as viable, but i also see just documenting the limitation and stressing appropriate configuration as equally viable. 2c. -Justin Andrea Aime wrote: > Hi all, > recently I was investigating an OOM reported by a user > that was basically just using OpenLayers with tiles > and meta-tiling on a single machine (so one user connected > to GeoServer). > > The result of the investigation is not completely new, but > it's worrisome anyways. > Basically the user was moving around a lot using OL, > panning and zooming, and the VM was configured as default, > which on the platform of this example meant only > having 64M or memory. > Each request resulted in the building of a 3x3 meta tile, > thought of course not all requests triggered that as the > code prevents the same meta tile to be computed in parallel > by more than one thread. > > I've added some machinery to get a count of the concurrent > request working in parallel and usually the count was 6 > (which is the default Firefox max connections) but if > someone starts zooming around while OL is still asking > for the tiles of the current level, boom, one can > easily get up to 30-40 concurrent requests and the OOM > is pretty much guaranteed. > > The thing is, Firefox gives up on the older requests, > but GeoServer does not know that until it actually tries > to write anything to the response, which happens only > after the rendering is fully done. > Given that each meta tile uses 2+MB of memory, > it does not take much to fill up a 64MB heap > (especially since good part of it > is already filled with the HSQL EPSG database cache, > around 19MB, hopefully switching to H2 will give > us some breathing room in the future). > > We really need to find a way to make GeoServer stop working > on requests that the client has dropped. > > I've looked a bit around, here is what I've found. > Apache in CGI mode kills the cgi process as soon > as the connection is dropped. > In Java we cannot, because we're using threads, and the > threads share resources, one cannot kill one without > bad consequences. > > I looked into the servlet API but could find no "supported" way to > actually guess if the client connection is still alive > or not, it seems one has actually to try and write something > on the output. > I asked on the Sun J2EE servlet forum and got a couple of answers: > http://forums.sun.com/thread.jspa?threadID=5408542 > > The idea of trying to flush() periodically seems to be a good > one, I've read in other places that flushing the output > stream should not turn the response into committed status: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4362880 > > The reason it's important that flush() does not commit > the response is that by the time one commits the response > the headers have to be set and cannot be modified, > and our dispatch system sets them only after the response > object has been created (the fully rendered image in our case). > Since we want to try periodic flush() call during the rendering > we would be in troubles, as the headers are set only after that. > > Alternatively, or in parallel to this, we could make sure no more > than X threads are rendering. > This could be done by using a concurrent queue limited in > size, each rendering action trying to push a token into it and > end up waiting if full. > This would solve the OOM, but would > make all the new request wait for the older ones to be > dropped, basically making GS WMS unusuable for a while. > Failing everything else, this may not be a such a bad idea. > With a little generalization we could apply this at the > dispatcher level and allow the administrator to set limits > to the number of requests GS is serving for each service > (typically you can serve much more WFS requests in parallel > than WMS ones). > > Another option that comes to mind is to get our hands > dirty and write plugins that leverage container specific > api to check if the connection is still alive. > Downside, it would work only for specific versions of > specific containers, and I haven't checked if such an API > exists at all. > > Well, do anybody have experiences on this? Suggestions? > > Cheers > Andrea > > -- Justin Deoliveira OpenGeo - http://opengeo.org Enterprise support for open source geospatial. ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Geoserver-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/geoserver-devel
