On Thu, Jan 19, 2012 at 8:55 PM, Graham Dumpleton <[email protected]> wrote: > It isn't out of the ordinary for response times to go up as number of > concurrent requests is increased. It is the sum total of the
Ok. It's good to know that the basic phenomenon is normal. Does it make sense that the hello world django view would have response times that can be very closely modeled with this equation: avg time = concurrent reqs x 1 ms + 1ms Specifically, I would have thought that since I have 2 cores on this vm, 2 concurrent requests wouldn't go so much slower than 1 request at a time. I'm seeing scaling problems to a much greater degree on our production systems, but there it was and may still be because we're actually maxing out the cpu. To give a little more info about the production system, we've got varnish sitting in front of the app servers and it's sometimes returning 503s. I'm trying to track down exactly what's going on. We had very high cpu usage on the app servers which I improved by changing the app to make fewer and bigger db queries. We also limited the number of connections varnish will make to an app server which may have helped. Now cpu usage is usually ok but responses will still occasionally take a long time. To be clear, I've got a few different environments/tests here: 1. the static, wsgi, django benchmark [1], 2. the static file benchmark which includes throughput [2], and 3. our production systems. [1] https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AurdDQB5QBe7dGFncnlUNkdKMVJ4NnYtRjhjaGFIeFE&output=html [2] http://serverfault.com/questions/344788/why-is-static-page-response-time-going-up-with-increased-concurrent-requests, https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AurdDQB5QBe7dGtiLUc1SWdOeWQ4dGo3VDI5Yk8zbWc&output=html I thought I'd see if a minimal test case could pinpoint a bottleneck that might be contributing, but maybe I'm barking up the wrong tree and I should dig into the details of the production systems with new relic instead of focusing on the minimal benchmarks. > What you aren't graphing is throughput, so relying on response times > alone is deceiving. That makes sense. I'll keep it in mind. > https://plus.google.com/114657481176404420131/posts/G1jM6WW3Pnu Cool post, thanks! > Whether one configuration is better than another will depend on your > specific application and whether they are cpu intensive tasks for i/o > bound tasks. > At that point for your application you look at changing processes vs > threads at server level to make it work more efficiently, but usually > more importantly tuning your application to cut down application and > database bottlenecks. Get rid of the worst and you will bring down > response times that way as well. In our production app, we seem to be using quite a bit of CPU and sometime were (are?) bottlenecking on it. This would lead me to think that we'd want to have more mod_wsgi processes with fewer threads, but then we need more memory and we're already up against what our current vms have. Does new relic have a report on what's using memory by any chance? :-) Speaking of application and database bottlenecks (*hijacks discussion in a new direction* :-)), it seems that making db queries is using a lot of cpu! I drastically improved our cpu usage by making our app do more joins so it could get data from the db in fewer queries. I've got johnny-cache set up so queries involve some extra work, but I wouldn't expect them to use so much cpu when the memcached servers are on separate machines. Does anything come to mind about why this would be happening? > In those graphs you will see a few things which exploring to help tune > those things. These are thread utilisation and queuing time. Very cool. On the production site, since we've got varnish sitting in front of the app servers and limiting itself to 10 connections per app server, we definitely should have enough apache workers. It is possible that we're running out of mod_wsgi daemon workers under certain circumstances (we have many sites in separate mod_wsgi groups or whatever they're called, with 5 threads each). > If you are happy to try mod_wsgi 4.0, you can get access to both > queuing time and also thread utilisation. It will all change at some > point, but the current New Relic Python agent is able to grab the data > and you can graph it with custom views to get those charts I link to. Interesting. I'd like to pursue the performance problems a little further with mod_wsgi 3. If I don't get anywhere, I'll speak with my team about trying mod_wsgi 4. > As to your New Relic account, I can't see to find it by searching for > email or name. You might let me know the account number that shows in > URL so can look at what data you are getting and if you try mod_wsgi > 4.0, can give you the custom view definition you can setup to get that > chart. The account number is 75245. It has an application for our production setup. Thanks for taking a look! > BTW, don't entirely focus on the application side either. Although you > may get your application performance times down to 100ms levels, the > users will not care much if they are still seeing 6 second page load > times because of page rendering times. Better user satisfaction can > therefore often be more quickly by improving the HTML/JavaScript sent > back to the browser. That makes a lot of sense. Thank you so much! Dan -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
