Hi, lately at OpenGeo we've been having some troubles keeping up a few WMS demos due to exceptional load. Looking into them it's easy to see that our WMS does not defend itself from too high workload, as I've reported in a jira almost 2 years ago: http://jira.codehaus.org/browse/GEOS-1127
I've been given some time to try and provide a few solutions that can be landed in GeoServer 1.7.x series in order to make putting out GeoServer WMS in the wild less of a concern. Since we're talking of 1.7.x the changes have to be as less invasive as possible, but the idea and the configuration should be portable unchanged to trunk where we can find a fuller, more extensive solution (time and funding permitting, that is... if the above jira teaches anything, is that finding resources to pull this off is harder than it would seem at a first sight...). Mail thread wise, I would suggest we stick on what can be done on 1.7.x, since I have no mandate to do a full fledged solution on trunk, but only to make simple changes to 1.7.x. If you feel the proposed solutions are not good for 1.7.x, or are not good at all, just say so, I will stop my attempt and we'll start waiting again for resources for a fuller solution. I also encourage anybody interested to start discussing in a separate thread, so that we have a design ready for estimates should anyone with funds be interested in having it implemented. The following are the items I'm thinking about for the 1.7.x branch. Memory usage -------------------------------------------------------- A way to limit the memory used by each request. WMS requests do use quite an amount of memory due to the need of setting up the drawing surface, which is usually width * height * 4 (4 bytes per pixel). So a 1024x1024 image sucks up 4MB of memory (this is the quite typical 4x4 GWC metatile). If one is determined enough, and he has access to a big enough dataset on the server side, he can make a request with a custom style that will suck up 99% of the heap without going into OOM itself, but making any other legitimate request OOM. Even without a big dataset, you can make a loop of big enough requests and obtain the same effect. Now, external tools can be used to throttle down too many requests from a single host I think, but those tools won't be able to asses the image size being requested. So one config item I would like to add is image size. As per Gabriel suggestion in private mail, a x MB per request cap seem to be a good one. It would be a global WMS parameter, simple to check, and I would like to land a patch for this in 1.7.x, without adding the param to the UI, and add the UI in trunk instead. The parameter could be a new full fledged field, or an entry in the metadata map. I would prefer the former. Time usage ------------------------------------------------- A request taking too much time to execute is no good. If you look at WFS, this requirement has been turned from the time to the feature count dimension, and even in that case, we had to allow admins to turn off bounds computation on the returned feature collections because that single thing could take minutes on big data sets. WMS wise we could do the same, but in the end you can take a lot of time due to many features, or to a few gigantic ones. Gabriel has provided a solution at the NY sprint that involves setting up a thread pool that executes the rendering, and that can be timeout out on config (and that can be also limited in terms of how many threads do actually perform rendering). I have some reservations on applying this kind of solution on 1.7.x due to a couple of things: - it always requires two threads per request, one provided by the container that is executing the http request, and another doing the actual rendering in the thread pool - it changes the request is executed even when if the admin did not activate it I was considering a lower tech solution involving the usage of a timer. A timer is started before the rendering starts with the timeout time as its delay. If the rendering terminates within it, the timer is just cancelled. If the timer is activated instead, it calls the stop() method over the renderer, and for good measure it also disposes of the graphics the renderer was using so that coverage rendering is killed as well. Mind, this ends up using extra threads as well, but the main path is unaltered, and if the option is not enabled, the main path is not modified at all. At that point, we can decide whether to throw a service exception, or return the partially generated image with some marker showing it timed out. I would go for the former. Configuration wise, I suggest we add a wms timeout specified in seconds, and again, add only the config option to 1.7.x, and provide a UI for it on trunk. Number of rendering errors -------------------------------------------------- The StreamignRenderer has been developed for a long time having uDig as the use case. One of the effects of this shows up in its "best effort rendering", which means the renderer skips features it cannot render and goes on. Typical issues that may arise during rendering are reprojection problems, invalid geometries, but also data source connections suddenly being severed. In face of this, the renderer just keeps on going, eventually wasting a lot of time handling exceptions. I would like to add a max errors setting inside the renderer. It was there once, and an error counter is still available in the code, but most of it has been removed. This thing can also be implemented as a listener too, yet listeners are kind of heavy in that they are also informed of each feature rendered, not only of errors. Also, there is the also the thing that by implementing timeouts we also make it impossible for this "best effort rendering" to keep the cpu busy for more than x seconds. Having this knob has its own merit thought, as wasting time handling exception is an expensive and useless way to burn CPU cycles. Questions ------------------------------------------------- Justin, to make sure, what's the effort involved into adding an option to the configuration in a way that it goes straight to the services without the need to add it to the UI in 1.7.x? I think it would require changing the xml reader/writer classes, the involved ServiceInfo class, and that would be it, assuming the patch goes down an grab it? I guess if I use the metadata map I would not even need to change the reader/writer classes or the ServiceInfo, but only change the service code, right? Conclusion ------------------------------------------------- While there are other items in the checklist of a more solid server (like disallowing customs styles, disabling certain output formats) the above seem to strike the best bang for the buck, and I believe I can implement them in the time I've been given (16 hours, for the record). Feedback welcomed Cheers Andrea -- Andrea Aime OpenGeo - http://opengeo.org Expert service straight from the developers. ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://www.creativitycat.com _______________________________________________ Geoserver-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/geoserver-devel
