Bruce -
Thanks; that's a good start. 5,000 requests at a time is a lot. If your estimate the average user will look at a map for 10 seconds before requesting another one, you're talking about 50,000 simultaneous human users. Thanks for posting the map file - that helps. There is one huge thing you should be aware of and think about right away - the difference between mode=browse and mode=map. If you are using browse mode with an HTML template (as it appears you are) you should reconsider that decision. When you use MapServer in browse mode, it needs to generate the map image, legend image, etc. and then write them all to a local disk before returning an HTML template to the client with the URLs of those images embedded in it. Writing to disk is the slowest thing you can do on your server. Having 5,000 simultaneous disk writes, with simultaneous reads to the same disk, is very nearly pure evil. That is a huge drag on performance. You really can't sustain nearly as many users on a given machine in browse mode as you could in map mode. What kind of disk subsystem is being used for the temp files? Map mode (mode=map) is a simple HTTP request for an image, and the image, after being built, is sent directly to the client with no disk I/O involved. Much nicer. Much faster. If you must use browse mode, then I'd suggest you create a RAM disk to hold the temp images, with appropriate monitoring to clean out old files promptly. But can you pre-build a set of legends, etc. so you can just embed those and not crank them out every time? You can also get some benefit from storing our input shapefiles on a RAM disk, but don't devote any RAM to that purpose if you're still writing temp files to hard disk. Let me first toss off a few generic suggestions that probably won't help much but are worth doing: 1. Get rid of unused fonts in your font file. MapServer will reach out and touch each one for each map request. 2. If your shapefiles don't change often, preprocess them to create a separate shapefile for each CLASS, so you don't have to filter as much. 3. If you must use CLASSes, organize them so the most commonly-used class comes first in each LAYER. 4. If you have a certain number of combinations of shapefiles used in a given request, create multiple map files with only those layers. For example, if your 12 layers are one set of 4 base layers that are always used, and then 8 more layers only one of which is displayed at a time, create 8 map files, each with only 5 layers instead of 12 - the four base layers and one specific overlay layer. Then use your application code to figure out which map file to select. And make sure you're using shptree to generate spatial indexes for all your shapefiles! This looks like it should be an application capable of running pretty darn quickly. - Ed Ed McNierney Chief Mapmaker Demand Media / TopoZone.com 73 Princeton Street, Suite 305 North Chelmsford, MA 01863 [EMAIL PROTECTED] Phone: +1 (978) 251-4242 Fax: +1 (978) 251-1396 From: UMN MapServer Users List [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Cheney Sent: Thursday, November 08, 2007 12:17 PM To: [email protected] Subject: Re: [UMN_MAPSERVER-USERS] System Configuration Many good questions. I will see if I can catch them all. A bit on the nature of the application. * We are only using vector data (we assumed that the raster would slow it down). * We are serving many different maps stored separately with the same composition of layers. Each map has 6 layers (4 polygon and 2 point layers for labeling). * Each of the different map sets has differing quantities of features. For the most significant layers they average around 10,000 features but may be as high as 100,000. * A majority of the requests are to display a small area of one of the maps so the rendering focuses in to a few features. The user will query a database which will allow for viewing the map that is zoomed to the area of interest. * No layer reprojecting (we assumed this would also slow it down). * The output map is PNG with dimensions 419 X 403. * We are using PHP_mapscript to generate the requests. The parameters for the map generation come from a database and the user requested location. So there are a few lines of code to find the location on the map and generate the images. * The mapfile contains about 12 layers. Several layers to display the primary polygon layer thematically and a couple extra to show the polygons with outlines. * Data is stored in Shapefiles I made attempts to stream-line the use of extra features to ensure the speed. I certainly may be using items that hurt instead of help. Here is the mapfile. Now that I look at the mapfile there may be a couple items that I originally had intended to use but are now just relics and time wasters. <<postforwebforum.map>> Now as for the users. We are assuming 5000 simultaneous - all at the same instant. This would assume a substantially larger group of users accessing the site at the same time. We assume this to be the peak stress for Stage 1 of the app. Bruce - My channeling sensors went off when Frank rang <g>. It is certainly true that a bit of experience and contemplation can help you discover optimization opportunities that aren't immediately self-evident. Can you describe the nature of your map application? Are you using raster data, vector data, or both? What size is your data, in numbers of features and/or files? What kind of disk subsystem is being used? Is there layer reprojection going on? Generalizations are rarely helpful (except for this one). It's like being told the average man is 5' 7" tall - it tells you nothing about how tall I am. MapServer performance depends on a number of factors, but the best place to start is a detailed understanding of what exactly you're trying to do with MapServer. It would be most helpful to us if you could post your map file and a sample URL request, preferably one that is externally (publicly) visible. And can you define what you mean by "simultaneous" users? Do you mean 5,000 map requests all being generated at exactly the same time? Or do you mean 5,000 human users asking for a new map every X seconds or so? And if the latter, what value are you using for X? - Ed Ed McNierney Chief Mapmaker Demand Media / TopoZone.com 73 Princeton Street, Suite 305 North Chelmsford, MA 01863 Phone: 978-251-4242, Fax: 978-251-1396 [EMAIL PROTECTED] -----Original Message----- From: UMN MapServer Users List [mailto:[email protected] <mailto:[email protected]> ] On Behalf Of Frank Warmerdam Sent: Wednesday, November 07, 2007 8:26 PM To: [email protected] Subject: Re: [UMN_MAPSERVER-USERS] System Configuration Bruce Cheney wrote: > We have been given a requirement to support 5000 simultaneous users. > What we are finding is that MapServer bogs down around 400 > simultaneous users on a test machine. It looks like it is likely > slowing because of the threading issue. We haven't tested on a > production machine but are estimating that it should support double > what are test machine could handle (double the processor and RAM). So > at least 800 simultaneous users. Divide that out with the 5000 and we > need a minimum of 6-7 web servers supporting MapServer. We will > certainly scale this as is needed but I do need some idea going in as to what is going to be required. Bruce, I'm curious how many map requests per minute you expect 800 simultaneous users to generate. > Does this sound like results that others expect or is this quantity > above what others have tested? Also Does anyone know of a solution in > the works to run make mapserver thread safe and/or up the overall > speed? I am not complaining about the speed just wondering what is in > the works. In various aspects MapServer is already thread safe though there are also known "unsafe" components, and some components are wrapped by big locks that significantly reduce the value of multiple threads. Progress occurs by fits and starts, largely based on support from user organizations depending on multi-threading. For instance, in 5.0 I implement locking around OGR for a client of mine in Australia. (This is a subtle way of suggesting you hire someone to make this happen if it is what you want!) All this aside, by default MapServer is *massively multi-threaded*. I say this since the default operation is to start a new cgi instance for each request - each is essentially an independent thread. Of course, the downside of whole-process cgi style multithreading is that very little context is preserved from request to request. Map files, data file headers, etc all need to be reparsed for each request. My point here is that you need to think carefully about the application flow to take much advantage of multiple threading within a single process. Also, if I may channel Ed, if you wanted to squeeze more performance out of mapserver, you really need to start by figuring out what it is spending it's time doing. Where is it spending it's time? o waiting for disk? (perhaps you are reading more data than you need?) o rendering (perhaps your data is overdense, or you are using expensive rendering options?) o parsing mapfiles (perhaps you mapfile has too many unused layers?) etc. Best regards, -- ---------------------------------------+-------------------------------- ---------------------------------------+------ I set the clouds in motion - turn up | Frank Warmerdam, [EMAIL PROTECTED] light and sound - activate the windows | http://pobox.com/~warmerdam <http://pobox.com/~warmerdam> and watch the world go round - Rush | President OSGeo, http://osgeo.org <http://osgeo.org>
