On Sat, May 8, 2010 at 1:49 AM, Sidnei da Silva <[email protected]> wrote: > Hi Robert, > >> Now, I'm not suggesting we go out and invent such a dashboard itself - >> there's going to be a tonne of investment needed to do that, but >> perhaps there is an open source version of this out there already for >> zope apps? Or perhaps we could look at providing a zope plugin to talk >> to newrelic? > > I'd go as far as saying that there would be nothing Zope-specific to > collecting the kind of metrics that would be interesting.
I think you're wrong there, but I do agree that there are many non Zope metrics. Some Zope metrics: - template engine time. - ORM time [while you can claim its storm, I think in terms of 'appserver' here; you'd want it glued together well and outputting in sync with the zope transaction ending; so definitely need *some* zope glue to do it] - accept() backlog delay > The great > majority of stats would be collected at the HAProxy/Squid/Apache > level, and the remaining ones would likely be Storm or other > subsystems like RabbitMQ for Landscape. Maybe finding out if the > threads of a certain Zope app server are exausted would be useful, but > that's the only thing that comes to my mind. Grabbing stats for a single request across haproxy + squid + apache would be *awesome*. SSL handshake time; cache lookup time etc. Oh and we need to add memcached too these days. > The situation is the same. We have a way to collect some metrics but > if they are not aggregated there's not much point in having them > except during development. Thats exactly it! > I have a lot of interest in the subject of collecting metrics and > analyzing bottlenecks, I even have a book or two around that I > recommend for people that want to dig into the subject (eg: The Art of > Capacity Planning). However, when it comes down to actually doing it I > feel like our developers are way too distant from the LOSAs. It might > be that I just never tried to get a new metric graphed, and I've never > seen any graph from Apache or HAProxy internally, though I trust that > they exist and someone is watching over them. The thing about the tuolumne graphs and nagios meters is that they are very manual: you can't 'drill down' into a bad metric to find where its coming from, unless the lower data is already configured in Just The Right Way. Key metrics, for crisis handling and detection are great; they aren't great for exploring things - and thats what I feel we're missing as developers. -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

