Re: [modwsgi] processes and thread sizing

Kent Mon, 10 Sep 2012 06:16:16 -0700

Graham,
Thanks very much for this excellent information.  The videos are very 
informative and you've got me on a great start.


We are using daemon mode with apache that is apparently compiled with 
prefork, the linux default.
httpd -l
Compiled in modules:
  core.c
  prefork.c
  http_core.c
  mod_so.c

You warned against bad MPM settings, but I don't know where to look for how 
to determine what *good* MPM settings are.  Can you point me there, or is 
this largely a problem for embedded mode only?

I'd also submit that clearly much of our app time is spent in database 
waits as our app (for now) is in the cloud and speaks with remote database, 
hundreds of miles away.  Further, many of our requests take quite long, 
some up to 10 seconds or more, since we are often saving quite complex 
orders with many business rules requiring many database round trips.

I can see this type of situation as being ripe for backlog, you agree?  We 
have 8 CPU cores and, since we had the RAM available anyway, after watching 
your videos I've increased from processes=4 threads=8 to processes=16 
threads=10 and will monitor from here.

I'd love to find the time to monitor more extensively and try out new 
relic..., but if anything I've mentioned is throwing red flags in your 
mind, please let me know: in all humility, I know that this is not my area 
of expertise.


On Friday, September 7, 2012 1:43:12 AM UTC-4, Graham Dumpleton wrote:
>
> On 7 September 2012 04:14, Kent <[email protected] <javascript:>> wrote: 
> > Is there a way to tell when/whether/how often we hit the condition that 
> an 
> > http request is fed to mod_wsgi for which there is no currently 
> available 
> > process thread, so it must wait in queue?  Can this be logged?  I'm 
> trying 
> > to figure out how to appropriately size my processes and threads 
> parameters, 
> > any help there is much appreciated! 
>
> First up, go watch: 
>
> http://lanyrd.com/2012/pycon/spcdg/ 
> http://lanyrd.com/2012/pycon-au/swkdq/ 
>
> as it talks a bit about these issues. 
>
> So, what one can do depends on how you are using mod_wsgi. Embedded 
> mode or daemon mode? 
>
> With embedded mode not too much you can do just within 
> Apache/mod_wsgi, because the connection gets queued in the socket 
> listener queue for Apache itself for which there isn't a great deal of 
> visibility. So Apache doesn't know how long it may have been sitting 
> in the listener socket backlog queue before it gets it. 
>
> This arises because Apache will only accept a request when it actually 
> has the resources to handle it. Thus when all processes/threads busy 
> and the request will backlog in that socket listener queue. 
>
> If you are using daemon mode, you can do a little bit better because 
> of the fact that web application processes are behind Apache. Thus you 
> can time stamp a request when Apache does accept it and look at the 
> different between that and the current time when the application in 
> the daemon process actually gets to handle it. 
>
> What this is therefore showing is where the daemon mode processes get 
> overloaded, although does require Apache worker processes still having 
> enough threads to keep accepting requests and let them back up in the 
> worker processes rather than the listener queue, otherwise time stamp 
> not applied. 
>
> In mod_wsgi 3.4 (just release recently), it will automatically time 
> stamp all requests and make that available in the WSGI request environ 
> dictionary as 'mod_wsgi.queue_start'. Doing: 
>
>   queue_start = int(value) / 1000000.0 
>
> will give you a time stamp in seconds that can then be compared to 
> time.time() to work out how much time occurred between Apache 
> accepting the request and the web application getting passed the 
> request. You could write a little middleware that monitors that. 
>
> Beyond queueing time, the next measure one can use is thread 
> utilisation. This is a measure of how much of the capacity of the WSGI 
> server is being used. In effect it is time spent serving requests 
> divided by time it could have spent serving requests based on 
> available number of processes/threads. 
>
> The value of thread utilisation is that once you head towards 100% and 
> stay at high levels, you know you are starting to run out of capacity. 
>
> In combination with queueing time, as thread utilisation increases, 
> then queueing time because of backlog will also increase. 
>
> Measuring thread utilisation is interesting and a bit trick to do in 
> pure Python with doing lots of thread locking which could impact 
> performance. Using a C extension one can do it with acceptable 
> overhead. 
>
> Important to realise though is that these sorts of measures should 
> only be seen as one part of what you should be monitoring. The need to 
> fiddle things to increase capacity actually means you most likely are 
> doing a poor job at making your application perform better. 
>
> You know you are doing the right thing when these measures prove that 
> you can safely drop processes/threads and not the other way around. 
>
> Anyway, the two talks I link to talk a bit about these issues and give 
> examples. 
>
> Although you can easily do queuing time yourself, because thread 
> utilisation is tricky and because all this stuff is better seen as one 
> part of an overall monitoring strategy, it is going to much easier 
> were you to just use New Relic, which does all this stuff and more. 
>
> Queueing time is visible in the New Relic Lite plan if you don't want 
> to pay for New Relic after its trial period ends. The thread 
> utilisation and resulting capacity analysis reporting based on it are 
> though part of the paid level, so once you drop to Lite you don't get 
> access to it anymore. You still have the trial period though to get an 
> answer to your question. 
>
> The normal trial period for New Relic is 14 days. Use this URL at the 
> moment and you can get an extended trial. 
>
> http://newrelic.com/30 
>
> So having monitoring is best way of trying to work out what is going 
> on and then using the result of that to tune your configuration. 
>
> Another area one can investigate, especially if using embedded mode, 
> is if you have totally screwed up your MPM settings, or were using the 
> defaults Apache ships with which aren't very good for Python, 
> especially if using prefork MPM. 
>
> I have been doing some work in that area as well as far as writing 
> some scripts which will validate the Apache configuration and produce 
> some charts which show how it behaves under certain simulated 
> conditions. These tell you if you have stuffed it up and are going to 
> cause Apache to perform badly through basic process management. 
>
> I have this stuff working for worker MPM, but not prefork MPM yet. I 
> am not sure I want to make it available just yet though. 
>
> Enough words, a couple of images to wet your appetite. 
>
> https://dl.dropbox.com/u/22571016/CapacityAnalysisExample.jpg 
>
> This one shows the capacity analysis page in New Relic giving how much 
> your server is being used. 
>
> https://skitch.com/grahamdumpleton/e1dqj/figure-1 
>
> This shows evaluation of worker MPM settings for Apache shipped as 
> source code. Not ideal for Apache, but can still be okay. 
>
> https://skitch.com/grahamdumpleton/e1dqa/figure-1 
>
> This shows evaluation of poorly chosen MPM settings done by user. 
>
> Too many processes were created initially which were immediately 
> killed because excess to requirements. As number of concurrent 
> requests increased, the incorrect configuration meant Apache would 
> swap between thinking it needed more processes and thinking it had too 
> many, so potential existed for it to continually kill off and then 
> restart processes. 
>
> You can all mull over those images. 
>
> Since I am about to go on holidays and aren't going to be online much, 
> my best suggestion is just to try New Relic and find that capacity 
> analysis report. 
>
> Also keep an eye out on the New Relic blog as there will be a post 
> going up in the next week sometime about the Capacity Analysis report. 
> It also includes additional information about using it to tune one 
> aspect of mod_wsgi daemon mode. 
>
> Enjoy the carrots for now. This exploration of MPM settings and 
> evaluating its effectiveness will be something that intend to talk 
> about at next PyCon US if talk gets accepted. 
>
> Graham 
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/modwsgi/-/tQYWopwYN1QJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Re: [modwsgi] processes and thread sizing

Reply via email to