> Is there interest in formalizing a hierarchical naming convention for
> metrics in Ganglia?

I agree that Ganglia's existing methods are very simplistic.

On the positive side, they are very easy to understand and they are both
sufficient and effective for simple situations

On the other hand, there are various issues:

a) multiple instances - net devices, filesystems, CPUs (maybe you've
seen my recent release of ganglia-modules-solaris with per-core CPU
support and per-disk IO stats?),

b) dynamic names: do you really want to see `net_bytes_out_eth1' if eth1
is a USB device and tomorrow it might appear as eth2 or eth3?  Or does
Ganglia need to have some mapping functionality, so the name would
appear as `net_bytes_out_wan' no matter what physical device name was
used?  The same issue applies to filesystems.

c) use of an existing hierarchy: could we borrow from SNMP and use the
OID, for example?  Maybe a future version of Ganglia could just be a
multicast transport for SNMP, and the gmetad would just poll the normal
SNMP daemon to get the mappings of OID->real device names

d) adding or removing devices (e.g. USB net or storage, virtual devices
on a VM, provisioning a SAN filesystem over fibre channel) while Ganglia
is running - at a very simplistic level, gmond could just restart itself
when it notices a change, but if a system is very dynamic, it could
appear that the daemon is flapping

e) application-specific monitoring: e.g. you run two UAT environments, a
demo environment and a production environment.  Each application
instance is a JVM.  You move the UAT environments around between
different servers, but you want to keep all the history from each JVM
and associate it with the name of the environment rather than the name
of the server.

f) excluding some things from aggregation: in the per-core CPU
monitoring, it doesn't mean anything to look at an aggregation of core
no. 3 from each of your 10 hosts, especially if 4 of the hosts only have
2 cores.

g) common solution with Nagios and other technologies: it may also be
desirable to have some naming convention (with meta-data support) that
can be shared, for example, something that could be used by Nagios,
preferably with enough meta-data to allow auto-configuration of things
that should be monitored

My feeling is that all these types of issues should go on a roadmap for
Ganglia 4 or beyond.  It is probably not possible to address them all in
one go, but if they are factored in to the next iteration of the
protocol, then they can be added incrementally

Cloud Computing - Latest Buzzword or a Glimpse of the Future?
This paper surveys cloud computing today: What are the benefits? 
Why are businesses embracing it? What are its payoffs and pitfalls?
Ganglia-general mailing list

Reply via email to