Hi Alex,
On Thursday 08 February 2007 18:31, Alex Balk wrote:
> Why would you consider gmetrics to be "second class"? I find those to be
> much more useful than the builtin metrics.
Ah, sorry; a slight misunderstanding here.
I'm considering gmetric metrics as "second class" just because of how the
metrics are XDR-encoded for their transport to a gmond. I'm not saying
they're less useful than the gmond-provided metrics, just that they're
treated differently when encoded into the UDP multicast packet.
gmetric metrics are always encoded as Strings ("string value<>" in
protocol.x). For number-based metrics, this can be very space-wise
inefficient. A binary format would be better, and this is what the "core"
metrics use.
Currently, only "core" metrics can be binary encoded. One can only extend the
list of core metrics by forking ganglia and make a version that's pretty much
incompatible with the ganglia.sf.net version (with the caveat that a
ganglia.sf.net-gmond can still publish to the new gmond, allowing some level
of interoperability). This is the Swiss binary-only Windows version of
gmond.
Simply allowing gmetric data to be binary encoded would be a great
improvement; but, perhaps a better approach would be to rethink the encoding
so they're no explicit mention of gmond metrics: none of the metrics would be
more important than others.
> In fact, the only thing
> that's nice about builtin metrics (other than the fact that you get them
> out-of-the-box) is that they get reported even when the machine is under
> extremely high load.
Hmmm: interesting. How are you sending gmetric data? Are you running gmetric
from cron? How are you generating the metrics? Do you know how much
overhead is involved sending gmetric data?
I've my own low-overhead gmetric-like solution (monami) that should just keep
on chuggin', like gmond, even when the machine is heavily loaded.
So, I'm interested in how monami does in difficult high-load situations.
Perhaps we could talk more off-line about this if its going a little
off-topic for the list.
> I view Ganglia as a framework rather than a performance monitoring
> solution. Anything can be encoded in the UDP messages through the use of
> gmetric, and specialized web interfaces can be quite easily built
> through the use of the "custom metrics addon" I'd written a while back (
> http://wtf.ath.cx/screenshots.html ). So you can not only access the XML
> data on machine, cluster & grid levels, you can also generate whatever
> UIs you want for your users... regardless of whether the metrics are
> hardcoded or brought in from gmetric.
Yes, sure. I'd imagine some people look at Ganglia as a complete solution,
others as a building block.
Its really just a question of at what point is the distinction between gmond-
and gmetric- generated metrics lost?
When gmetad reads XML from a gmond, data from both sources is pretty much
equivalent (except for the SOURCE attribute). After then, the data should
hit rrdtool as equivalent.
Thereafter, the only different being the default web-pages are setup to expect
core metrics and display them nicely. This is where you're custom metrics
addon comes in useful.
> The only thing left on my wishlist is support for 64bit metrics, to
> achieve better scalability over aggregated data.
BTW, you might want to add a wish-list/todo bugzilla entry, just so it doesn't
get lost.
Cheers,
Paul.