Re: [Ganglia-general] Monitoring

Steven Wagner Mon, 07 Oct 2002 10:21:17 -0700

Leif Nixon wrote:

Steven Wagner <[EMAIL PROTECTED]> writes:


Yes, that's what I did last week. It ain't no fun. Nagios' handling
of passive service checks isn't flexible enough. And passive host
checking Just Isn't Done.

Once again, considering you have the source at your disposal, I'm sure youcould work something out. Spackling in passive host checking is easierthan some of the alternatives. :)

The ganglia philosophy so far has been to make things work with a
minimum of tweaking.  Having set up three different open source
monitoring system over the last few years, it seems to me that it's
nearly impossible to set up notifications without a *LOT* of tweaking.

So there's two ways of doing this, I think:

*  We need config files.  Lots of them.   [WHOOSH!]  (1 per node?)
*  Monitoring thresholds are hard-coded as part of each metric definition.



Well, each metric could certainly come with default thresholds, and if
you use some inheritance mechanism you could rather easily specify
thresholds for all your cluster nodes:

In a per-node model you have to distribute the new config file to n nodesevery time you change something. Which is kind of a bummer, since (as Imentioned before) it seems that there's always an initial tweaking periodwith notifying mechanisms where you're changing the config every five minutes.

On the other hand, this will encourage people in ad-hoc clusterenvironments to put together a file distribution mechanism. :)

That way, you only need to specify any exceptions from the defaults.
Whooshy enough?

The mental image I was actually going for was the loading program from TheMatrix, substituting endless streams of configuration directives for rackso' firearms...

What would seem to take some consideration is how to keep track of the
metacluster state.

That part's easy, it goes in the gmetad config file. gmetad inheritsper-node and per-metric attributes from the data source, but usesa "generic" section of the gmetad config file for metacluster notificationproperties and the data source section of the gmetad config to determinecluster notification properties.

You need state tracking, since you want flank detection so you trigger
the klaxons only when a node goes down, and *not* every five minute
during its downtime. And for most metrics you want some hysteresis
mechanism so you don't get continuous notifications if a metric
fluctuates around the threshold.

*That* stuff needs to be in gmetad (or a program that fills the same niche,querying one or more metadaemons or monitoring cores, chewing on the XMLdata, and doing something with it). Flap thresholds, contact info, etc.,etc., etc. ...


Sounds like you're volunteering to write it. :P

Re: [Ganglia-general] Monitoring

Reply via email to