Hi Robert, On Fri, May 7, 2010 at 2:48 AM, Robert Collins <[email protected]> wrote: > I was at devopsdownunder last weekend and saw a demo of a very > interesting tool. Have a look at > http://rpm.newrelic.com/v2/accounts/12842/applications/113766 - > ignoring the bling, its a tool for individual and aggregated > statistics on *every single request* going through an application > stack. > > Kind of what we get with oops reports (database time, python time) but > pervasive rather than only-on-the-broken-requests. > > I think one of the challenges with performance work at the moment - > and please, correct me if I'm wrong - is that individual developers > can't easily, routinely see where things are at. Right now, when > someone asks 'why is xxx slow', the best we can do is: > - add ++oops++ to the url to trigger an oops > - wait 3 +- 3 minutes for it to sync > - look it up on the oops website > > This has two issues: > - we can't see if its *usual* for that page to be slow, or if its > unusually slow for one individual. > - its slow and cumbersome. > > For instance, if we want 100ms page generation, it would be terribly > useful to be able to see that right now, on average, we're spending > (say) 60ms in the database. > > Now, I'm not suggesting we go out and invent such a dashboard itself - > there's going to be a tonne of investment needed to do that, but > perhaps there is an open source version of this out there already for > zope apps? Or perhaps we could look at providing a zope plugin to talk > to newrelic?
I'd go as far as saying that there would be nothing Zope-specific to collecting the kind of metrics that would be interesting. The great majority of stats would be collected at the HAProxy/Squid/Apache level, and the remaining ones would likely be Storm or other subsystems like RabbitMQ for Landscape. Maybe finding out if the threads of a certain Zope app server are exausted would be useful, but that's the only thing that comes to my mind. One thing we recently added to Landscape was a little debugging helper that can be enabled during development, and looks like this: http://www.ubuntu-pics.de/bild/58528/selection_061_X2Xt9L.png The situation is the same. We have a way to collect some metrics but if they are not aggregated there's not much point in having them except during development. I have a lot of interest in the subject of collecting metrics and analyzing bottlenecks, I even have a book or two around that I recommend for people that want to dig into the subject (eg: The Art of Capacity Planning). However, when it comes down to actually doing it I feel like our developers are way too distant from the LOSAs. It might be that I just never tried to get a new metric graphed, and I've never seen any graph from Apache or HAProxy internally, though I trust that they exist and someone is watching over them. -- Sidnei _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

