Mark Seger wrote:
I wasn't sure if this is the right place to ask, but I figured if not,
someone will certainly tell me. 8-)
The developers list should hold at least some answers to your questions.
From it you can get a pretty good sense of Ganglia's direction and
capabilities (not to mention limitations).
However, list archive spelunking gets old after a while. :)
From what I've seen on a few of the demo pages as well as internally
installed systems, I'm quite impressed with ganglia and what it does.
However, I've often seen people take something that works quite well
within a specific problem domain and try to extend it into places where
it wasn't intended to go and I want to make sure I don't do the same
thing. I've looked through the documentation and a sub-set of the mail
archives and still have some very basic questions, primarily around the
design principles/philosophy.
Ganglia's not (currently) competing with SNMP on all points. There is *no*
notification system at present - data's collected through polling only.
The metadaemon polls the monitoring cores every X seconds (the default is
15 seconds, I believe). This can be adjusted lower but RRD stores
timestamps as a seconds-since-the-epoch integer. Updating twice in one
second is like crossing the streams - it will fail out of the entire RRD
update process. You will also want to increase the number of data points
by tweaking the RRA definitions in gmetad/rrd_helper.c (IIRC).
There's no reason to use gmetad at all - you can always poll the monitoring
core yourself, parse the XML and post the data to a data store of your own
choosing - flat file, database, et cetera.
The monitoring cores run their metric-gathering code (see
$GANGLIA_SRC/gmond/machines/$PLATFORM.c) at a threshold determined by one
of the gmond header files and, based on the time since the metric was last
transmitted and the degree to which it's changed, it decides whether or not
to send it.
Okay, now that I've laid all that out here it's going to play hell with the
rest of my quoting. :)
It would seem to me that ganglia's primary design target is the
real-time monitoring of large clusters with an eye towards coarser time
resolution and by that I mean time resolutions measured at minutes as
oppsed to seconds, at least that's what I see when I look at a display.
In other words real-time system/network management type functions such
as the identification that a problem exists, where that problem is and
some pretty good clues (as far as metrics) to what may be going wrong.
All of the health-type status displays in the web front-end are deduced
from the timestamp of the last report from the host and its system load.
If the timestamp varies too greatly, the node's marked as down. Otherwise,
the system load is compared to a hardcoded series of thresholds and the
appropriate color is assigned.
I also didn't see anywhere where ganglia gets its core data. It feels
like it's from the /proc structures and perhaps this is more out of
curiousity. I know there are facilities to supply your own data as
showed with the temperature monitoring example in the documentation but
I didn't see anything about snmp. Is the idea here to just use
something like snmpget and feed it to ganglia the same was as
temperature?
On Linux it's almost completely done through /proc, but porters have tried
to use the most graceful, direct manner of collecting metrics possible on
each platform.
Except for me, I just hacked together whatever I could get working. :)
I get the impression that the main reason for the gmonds multicasting to
each other is to assure that in the event of failures at least somebody
knows the state of the entire environment. Is that the reason this is
done? It therefore sounds like in a large environment one picks a small
subset of nodes for that function rather than having everyone
multicasting to everyone else.
That's true. Personally, though, I don't have any complaints about the
monitoring core's memory usage.
The problem space I'm interested is addressing is that of finer grained
monitoring with an eye towards either benchmarking and/or problem
resolution in which one often needs data points at a resolution of a
second or two and rarely more than 10. As for the number of data
elements to monitor, I'd set that number somewhere between 50 and 100, a
good example would everything in SAR plus a variety of other statistics
such as nfs and iostat -x. For example, I run SAR on my systems at 10
second collection intervals with minimal overhead. Specifially, I only
see about 10 seconds of system time over the course of the day.
The next major release of Ganglia will include a plug-in architecture that
should make it easier to manipulate the monitoring core and metadaemon
functionality. There are going to be some major changes under the hood and
we're still battling some of them out on the developers list. Most notably
is the manner in which we describe and transmit metrics.
Ideally we would like the metadaemon to open a persistent connection to its
data source and have the data source relay metric data to it as soon as the
source receives them.
It sure beats receiving and parsing a (possibly very large) data stream
every X seconds.
It seems to me that, in your situation, this method of connection would be
very useful, but you would of course be on your own as far as storing the
data in a useful manner.
This also touches on the use of the rrd. I think it's really cool the
way it ages data for longer term storage, but in the case of
troubleshooting I need complete data. Often it's used within a few data
of collecting, but at other times people may want to simply archive it
in its native form. Since rrd must be created at a fixed size, perhaps
the answer here is to periodically archive and recreate the database,
but that sounds like an administrative nusiance.
As I said above, RRD's main limitations (besides being non-reentrant :) )
are its one-second update granularity and its static data structure. You
just have to be happy with what you give it.
There's no reason you couldn't "elongate" the highest-resolution RRA to 24
hours or more. The RRDs would be many times larger, but that amount of
fast storage should be cheap...
Sorry for that long winded discussion, but the bottom line is that while
my guess is I could probably configure ganglia to make it do what I want
(fine grained, high number of variables) in a production environment,
the question is should I? Is there any way to guess what king of load
it would put on the system both in terms of cpu/memory? I suspect if I
installed it I could measure it, but I also thought this might be a
useful discussion topic as well.
It's possible - using gmetric will be where you'll see the most overhead.
If you feel like throwing some development effort at the problem you should
be able to bolt gmetric's code onto your data collection mechanism and call
libganglia directly. I wrote such an app six months ago and found it to be
quite well-behaved.
Of course, I was trying to multicast a whole *table* of new metrics
(process information), and that didn't work out so well...
But the new version's metric space won't be flat, so my little app should
work then. I can rewrite it as a plug-in!
Hope you find some of this info helpful. I'm sure our regulars will weigh
in eventually. :)