On Thu, Jan 19, 2012 at 8:55 PM, Graham Dumpleton
<[email protected]> wrote:
> It isn't out of the ordinary for response times to go up as number of
> concurrent requests is increased. It is the sum total of the

Ok. It's good to know that the basic phenomenon is normal. Does it
make sense that the hello world django view would have response times
that can be very closely modeled with this equation:
avg time = concurrent reqs x 1 ms + 1ms
Specifically, I would have thought that since I have 2 cores on this
vm, 2 concurrent requests wouldn't go so much slower than 1 request at
a time.

I'm seeing scaling problems to a much greater degree on our production
systems, but there it was and may still be because we're actually
maxing out the cpu. To give a little more info about the production
system, we've got varnish sitting in front of the app servers and it's
sometimes returning 503s. I'm trying to track down exactly what's
going on. We had very high cpu usage on the app servers which I
improved by changing the app to make fewer and bigger db queries. We
also limited the number of connections varnish will make to an app
server which may have helped. Now cpu usage is usually ok but
responses will still occasionally take a long time.

To be clear, I've got a few different environments/tests here:
1. the static, wsgi, django benchmark [1],
2. the static file benchmark which includes throughput [2], and
3. our production systems.

[1] 
https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AurdDQB5QBe7dGFncnlUNkdKMVJ4NnYtRjhjaGFIeFE&output=html
[2] 
http://serverfault.com/questions/344788/why-is-static-page-response-time-going-up-with-increased-concurrent-requests,
https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AurdDQB5QBe7dGtiLUc1SWdOeWQ4dGo3VDI5Yk8zbWc&output=html

I thought I'd see if a minimal test case could pinpoint a bottleneck
that might be contributing, but maybe I'm barking up the wrong tree
and I should dig into the details of the production systems with new
relic instead of focusing on the minimal benchmarks.

> What you aren't graphing is throughput, so relying on response times
> alone is deceiving.

That makes sense. I'll keep it in mind.

> https://plus.google.com/114657481176404420131/posts/G1jM6WW3Pnu

Cool post, thanks!

> Whether one configuration is better than another will depend on your
> specific application and whether they are cpu intensive tasks for i/o
> bound tasks.

> At that point for your application you look at changing processes vs
> threads at server level to make it work more efficiently, but usually
> more importantly tuning your application to cut down application and
> database bottlenecks. Get rid of the worst and you will bring down
> response times that way as well.

In our production app, we seem to be using quite a bit of CPU and
sometime were (are?) bottlenecking on it. This would lead me to think
that we'd want to have more mod_wsgi processes with fewer threads, but
then we need more memory and we're already up against what our current
vms have. Does new relic have a report on what's using memory by any
chance? :-)

Speaking of application and database bottlenecks (*hijacks discussion
in a new direction* :-)), it seems that making db queries is using a
lot of cpu! I drastically improved our cpu usage by making our app do
more joins so it could get data from the db in fewer queries. I've got
johnny-cache set up so queries involve some extra work, but I wouldn't
expect them to use so much cpu when the memcached servers are on
separate machines. Does anything come to mind about why this would be
happening?

> In those graphs you will see a few things which exploring to help tune
> those things. These are thread utilisation and queuing time.

Very cool. On the production site, since we've got varnish sitting in
front of the app servers and limiting itself to 10 connections per app
server, we definitely should have enough apache workers. It is
possible that we're running out of mod_wsgi daemon workers under
certain circumstances (we have many sites in separate mod_wsgi groups
or whatever they're called, with 5 threads each).

> If you are happy to try mod_wsgi 4.0, you can get access to both
> queuing time and also thread utilisation. It will all change at some
> point, but the current New Relic Python agent is able to grab the data
> and you can graph it with custom views to get those charts I link to.

Interesting. I'd like to pursue the performance problems a little
further with mod_wsgi 3. If I don't get anywhere, I'll speak with my
team about trying mod_wsgi 4.

> As to your New Relic account, I can't see to find it by searching for
> email or name. You might let me know the account number that shows in
> URL so can look at what data you are getting and if you try mod_wsgi
> 4.0, can give you the custom view definition you can setup to get that
> chart.

The account number is 75245. It has an application for our production
setup. Thanks for taking a look!

> BTW, don't entirely focus on the application side either. Although you
> may get your application performance times down to 100ms levels, the
> users will not care much if they are still seeing 6 second page load
> times because of page rendering times. Better user satisfaction can
> therefore often be more quickly by improving the HTML/JavaScript sent
> back to the browser.

That makes a lot of sense.

Thank you so much!
Dan

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Reply via email to