On 7 September 2012 04:14, Kent <[email protected]> wrote: > Is there a way to tell when/whether/how often we hit the condition that an > http request is fed to mod_wsgi for which there is no currently available > process thread, so it must wait in queue? Can this be logged? I'm trying > to figure out how to appropriately size my processes and threads parameters, > any help there is much appreciated!
First up, go watch: http://lanyrd.com/2012/pycon/spcdg/ http://lanyrd.com/2012/pycon-au/swkdq/ as it talks a bit about these issues. So, what one can do depends on how you are using mod_wsgi. Embedded mode or daemon mode? With embedded mode not too much you can do just within Apache/mod_wsgi, because the connection gets queued in the socket listener queue for Apache itself for which there isn't a great deal of visibility. So Apache doesn't know how long it may have been sitting in the listener socket backlog queue before it gets it. This arises because Apache will only accept a request when it actually has the resources to handle it. Thus when all processes/threads busy and the request will backlog in that socket listener queue. If you are using daemon mode, you can do a little bit better because of the fact that web application processes are behind Apache. Thus you can time stamp a request when Apache does accept it and look at the different between that and the current time when the application in the daemon process actually gets to handle it. What this is therefore showing is where the daemon mode processes get overloaded, although does require Apache worker processes still having enough threads to keep accepting requests and let them back up in the worker processes rather than the listener queue, otherwise time stamp not applied. In mod_wsgi 3.4 (just release recently), it will automatically time stamp all requests and make that available in the WSGI request environ dictionary as 'mod_wsgi.queue_start'. Doing: queue_start = int(value) / 1000000.0 will give you a time stamp in seconds that can then be compared to time.time() to work out how much time occurred between Apache accepting the request and the web application getting passed the request. You could write a little middleware that monitors that. Beyond queueing time, the next measure one can use is thread utilisation. This is a measure of how much of the capacity of the WSGI server is being used. In effect it is time spent serving requests divided by time it could have spent serving requests based on available number of processes/threads. The value of thread utilisation is that once you head towards 100% and stay at high levels, you know you are starting to run out of capacity. In combination with queueing time, as thread utilisation increases, then queueing time because of backlog will also increase. Measuring thread utilisation is interesting and a bit trick to do in pure Python with doing lots of thread locking which could impact performance. Using a C extension one can do it with acceptable overhead. Important to realise though is that these sorts of measures should only be seen as one part of what you should be monitoring. The need to fiddle things to increase capacity actually means you most likely are doing a poor job at making your application perform better. You know you are doing the right thing when these measures prove that you can safely drop processes/threads and not the other way around. Anyway, the two talks I link to talk a bit about these issues and give examples. Although you can easily do queuing time yourself, because thread utilisation is tricky and because all this stuff is better seen as one part of an overall monitoring strategy, it is going to much easier were you to just use New Relic, which does all this stuff and more. Queueing time is visible in the New Relic Lite plan if you don't want to pay for New Relic after its trial period ends. The thread utilisation and resulting capacity analysis reporting based on it are though part of the paid level, so once you drop to Lite you don't get access to it anymore. You still have the trial period though to get an answer to your question. The normal trial period for New Relic is 14 days. Use this URL at the moment and you can get an extended trial. http://newrelic.com/30 So having monitoring is best way of trying to work out what is going on and then using the result of that to tune your configuration. Another area one can investigate, especially if using embedded mode, is if you have totally screwed up your MPM settings, or were using the defaults Apache ships with which aren't very good for Python, especially if using prefork MPM. I have been doing some work in that area as well as far as writing some scripts which will validate the Apache configuration and produce some charts which show how it behaves under certain simulated conditions. These tell you if you have stuffed it up and are going to cause Apache to perform badly through basic process management. I have this stuff working for worker MPM, but not prefork MPM yet. I am not sure I want to make it available just yet though. Enough words, a couple of images to wet your appetite. https://dl.dropbox.com/u/22571016/CapacityAnalysisExample.jpg This one shows the capacity analysis page in New Relic giving how much your server is being used. https://skitch.com/grahamdumpleton/e1dqj/figure-1 This shows evaluation of worker MPM settings for Apache shipped as source code. Not ideal for Apache, but can still be okay. https://skitch.com/grahamdumpleton/e1dqa/figure-1 This shows evaluation of poorly chosen MPM settings done by user. Too many processes were created initially which were immediately killed because excess to requirements. As number of concurrent requests increased, the incorrect configuration meant Apache would swap between thinking it needed more processes and thinking it had too many, so potential existed for it to continually kill off and then restart processes. You can all mull over those images. Since I am about to go on holidays and aren't going to be online much, my best suggestion is just to try New Relic and find that capacity analysis report. Also keep an eye out on the New Relic blog as there will be a post going up in the next week sometime about the Capacity Analysis report. It also includes additional information about using it to tune one aspect of mod_wsgi daemon mode. Enjoy the carrots for now. This exploration of MPM settings and evaluating its effectiveness will be something that intend to talk about at next PyCon US if talk gets accepted. Graham -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
