Re: [Ganglia-general] High level design principles/philosophy?

Steven Wagner Wed, 02 Apr 2003 17:05:14 -0800

Mark Seger wrote:

I wasn't sure if this is the right place to ask, but I figured if not,
someone will certainly tell me.  8-)

The developers list should hold at least some answers to your questions.From it you can get a pretty good sense of Ganglia's direction andcapabilities (not to mention limitations).


However, list archive spelunking gets old after a while. :)

From what I've seen on a few of the demo pages as well as internally

installed systems, I'm quite impressed with ganglia and what it does.
However, I've often seen people take something that works quite well
within a specific problem domain and try to extend it into places where
it wasn't intended to go and I want to make sure I don't do the same
thing.  I've looked through the documentation and a sub-set of the mail
archives and still have some very basic questions, primarily around the
design principles/philosophy.

Ganglia's not (currently) competing with SNMP on all points. There is *no*notification system at present - data's collected through polling only.

The metadaemon polls the monitoring cores every X seconds (the default is15 seconds, I believe). This can be adjusted lower but RRD storestimestamps as a seconds-since-the-epoch integer. Updating twice in onesecond is like crossing the streams - it will fail out of the entire RRDupdate process. You will also want to increase the number of data pointsby tweaking the RRA definitions in gmetad/rrd_helper.c (IIRC).

There's no reason to use gmetad at all - you can always poll the monitoringcore yourself, parse the XML and post the data to a data store of your ownchoosing - flat file, database, et cetera.

The monitoring cores run their metric-gathering code (see$GANGLIA_SRC/gmond/machines/$PLATFORM.c) at a threshold determined by oneof the gmond header files and, based on the time since the metric was lasttransmitted and the degree to which it's changed, it decides whether or notto send it.

Okay, now that I've laid all that out here it's going to play hell with therest of my quoting. :)

It would seem to me that ganglia's primary design target is the
real-time monitoring of large clusters with an eye towards coarser time
resolution and by that I mean time resolutions measured at minutes as
oppsed to seconds, at least that's what I see when I look at a display.
In other words real-time system/network management type functions such
as the identification that a problem exists, where that problem is and
some pretty good clues (as far as metrics) to what may be going wrong.

All of the health-type status displays in the web front-end are deducedfrom the timestamp of the last report from the host and its system load.If the timestamp varies too greatly, the node's marked as down. Otherwise,the system load is compared to a hardcoded series of thresholds and theappropriate color is assigned.

I also didn't see anywhere where ganglia gets its core data.  It feels
like it's from the /proc structures and perhaps this is more out of
curiousity.  I know there are facilities to supply your own data as
showed with the temperature monitoring example in the documentation but
I didn't see anything about snmp.  Is the idea here to just use
something like snmpget and feed it to ganglia the same was as
temperature?

On Linux it's almost completely done through /proc, but porters have triedto use the most graceful, direct manner of collecting metrics possible oneach platform.


Except for me, I just hacked together whatever I could get working.  :)

I get the impression that the main reason for the gmonds multicasting to
each other is to assure that in the event of failures at least somebody
knows the state of the entire environment.  Is that the reason this is
done?  It therefore sounds like in a large environment one picks a small
subset of nodes for that function rather than having everyone
multicasting to everyone else.

That's true. Personally, though, I don't have any complaints about themonitoring core's memory usage.

The problem space I'm interested is addressing is that of finer grained
monitoring with an eye towards either benchmarking and/or problem
resolution in which one often needs data points at a resolution of a
second or two and rarely more than 10.   As for the number of data
elements to monitor, I'd set that number somewhere between 50 and 100, a
good example would everything in SAR plus a variety of other statistics
such as nfs and iostat -x.  For example, I run SAR on my systems at 10
second collection intervals with minimal overhead.  Specifially, I only
see about 10 seconds of system time over the course of the day.

The next major release of Ganglia will include a plug-in architecture thatshould make it easier to manipulate the monitoring core and metadaemonfunctionality. There are going to be some major changes under the hood andwe're still battling some of them out on the developers list. Most notablyis the manner in which we describe and transmit metrics.

Ideally we would like the metadaemon to open a persistent connection to itsdata source and have the data source relay metric data to it as soon as thesource receives them.

It sure beats receiving and parsing a (possibly very large) data streamevery X seconds.

It seems to me that, in your situation, this method of connection would bevery useful, but you would of course be on your own as far as storing thedata in a useful manner.

This also touches on the use of the rrd.  I think it's really cool the
way it ages data for longer term storage, but in the case of
troubleshooting I need complete data.  Often it's used within a few data
of collecting, but at other times people may want to simply archive it
in its native form.  Since rrd must be created at a fixed size, perhaps
the answer here is to periodically archive and recreate the database,
but that sounds like an administrative nusiance.

As I said above, RRD's main limitations (besides being non-reentrant :) )are its one-second update granularity and its static data structure. Youjust have to be happy with what you give it.

There's no reason you couldn't "elongate" the highest-resolution RRA to 24hours or more. The RRDs would be many times larger, but that amount offast storage should be cheap...

Sorry for that long winded discussion, but the bottom line is that while
my guess is I could probably configure ganglia to make it do what I want
(fine grained, high number of variables) in a production environment,
the question is should I?  Is there any way to guess what king of load
it would put on the system both in terms of cpu/memory?  I suspect if I
installed it I could measure it, but I also thought this might be a
useful discussion topic as well.

It's possible - using gmetric will be where you'll see the most overhead.If you feel like throwing some development effort at the problem you shouldbe able to bolt gmetric's code onto your data collection mechanism and calllibganglia directly. I wrote such an app six months ago and found it to bequite well-behaved.

Of course, I was trying to multicast a whole *table* of new metrics(process information), and that didn't work out so well...

But the new version's metric space won't be flat, so my little app shouldwork then. I can rewrite it as a plug-in!

Hope you find some of this info helpful. I'm sure our regulars will weighin eventually. :)

Re: [Ganglia-general] High level design principles/philosophy?

Reply via email to