For a start, since you are using mod_wsgi daemon mode, ensure mod_headers is enabled in Apache and then add:
RequestHeader add X-Queue-Start "%t" That will give you the queuing time on the overview chart. I'll comment more later when have a chance to look over email and data. Graham On 21 January 2012 09:07, Daniel Benamy <[email protected]> wrote: > On Thu, Jan 19, 2012 at 8:55 PM, Graham Dumpleton > <[email protected]> wrote: >> It isn't out of the ordinary for response times to go up as number of >> concurrent requests is increased. It is the sum total of the > > Ok. It's good to know that the basic phenomenon is normal. Does it > make sense that the hello world django view would have response times > that can be very closely modeled with this equation: > avg time = concurrent reqs x 1 ms + 1ms > Specifically, I would have thought that since I have 2 cores on this > vm, 2 concurrent requests wouldn't go so much slower than 1 request at > a time. > > I'm seeing scaling problems to a much greater degree on our production > systems, but there it was and may still be because we're actually > maxing out the cpu. To give a little more info about the production > system, we've got varnish sitting in front of the app servers and it's > sometimes returning 503s. I'm trying to track down exactly what's > going on. We had very high cpu usage on the app servers which I > improved by changing the app to make fewer and bigger db queries. We > also limited the number of connections varnish will make to an app > server which may have helped. Now cpu usage is usually ok but > responses will still occasionally take a long time. > > To be clear, I've got a few different environments/tests here: > 1. the static, wsgi, django benchmark [1], > 2. the static file benchmark which includes throughput [2], and > 3. our production systems. > > [1] > https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AurdDQB5QBe7dGFncnlUNkdKMVJ4NnYtRjhjaGFIeFE&output=html > [2] > http://serverfault.com/questions/344788/why-is-static-page-response-time-going-up-with-increased-concurrent-requests, > https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AurdDQB5QBe7dGtiLUc1SWdOeWQ4dGo3VDI5Yk8zbWc&output=html > > I thought I'd see if a minimal test case could pinpoint a bottleneck > that might be contributing, but maybe I'm barking up the wrong tree > and I should dig into the details of the production systems with new > relic instead of focusing on the minimal benchmarks. > >> What you aren't graphing is throughput, so relying on response times >> alone is deceiving. > > That makes sense. I'll keep it in mind. > >> https://plus.google.com/114657481176404420131/posts/G1jM6WW3Pnu > > Cool post, thanks! > >> Whether one configuration is better than another will depend on your >> specific application and whether they are cpu intensive tasks for i/o >> bound tasks. > >> At that point for your application you look at changing processes vs >> threads at server level to make it work more efficiently, but usually >> more importantly tuning your application to cut down application and >> database bottlenecks. Get rid of the worst and you will bring down >> response times that way as well. > > In our production app, we seem to be using quite a bit of CPU and > sometime were (are?) bottlenecking on it. This would lead me to think > that we'd want to have more mod_wsgi processes with fewer threads, but > then we need more memory and we're already up against what our current > vms have. Does new relic have a report on what's using memory by any > chance? :-) > > Speaking of application and database bottlenecks (*hijacks discussion > in a new direction* :-)), it seems that making db queries is using a > lot of cpu! I drastically improved our cpu usage by making our app do > more joins so it could get data from the db in fewer queries. I've got > johnny-cache set up so queries involve some extra work, but I wouldn't > expect them to use so much cpu when the memcached servers are on > separate machines. Does anything come to mind about why this would be > happening? > >> In those graphs you will see a few things which exploring to help tune >> those things. These are thread utilisation and queuing time. > > Very cool. On the production site, since we've got varnish sitting in > front of the app servers and limiting itself to 10 connections per app > server, we definitely should have enough apache workers. It is > possible that we're running out of mod_wsgi daemon workers under > certain circumstances (we have many sites in separate mod_wsgi groups > or whatever they're called, with 5 threads each). > >> If you are happy to try mod_wsgi 4.0, you can get access to both >> queuing time and also thread utilisation. It will all change at some >> point, but the current New Relic Python agent is able to grab the data >> and you can graph it with custom views to get those charts I link to. > > Interesting. I'd like to pursue the performance problems a little > further with mod_wsgi 3. If I don't get anywhere, I'll speak with my > team about trying mod_wsgi 4. > >> As to your New Relic account, I can't see to find it by searching for >> email or name. You might let me know the account number that shows in >> URL so can look at what data you are getting and if you try mod_wsgi >> 4.0, can give you the custom view definition you can setup to get that >> chart. > > The account number is 75245. It has an application for our production > setup. Thanks for taking a look! > >> BTW, don't entirely focus on the application side either. Although you >> may get your application performance times down to 100ms levels, the >> users will not care much if they are still seeing 6 second page load >> times because of page rendering times. Better user satisfaction can >> therefore often be more quickly by improving the HTML/JavaScript sent >> back to the browser. > > That makes a lot of sense. > > Thank you so much! > Dan > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/modwsgi?hl=en. > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
