Re: [Ganglia-general] Monitoring

Leif Nixon Tue, 08 Oct 2002 00:05:54 -0700

Steven Wagner <[EMAIL PROTECTED]> writes:

> Leif Nixon wrote:
> > Steven Wagner <[EMAIL PROTECTED]> writes:
> > Yes, that's what I did last week. It ain't no fun. Nagios' handling
> > of passive service checks isn't flexible enough. And passive host
> > checking Just Isn't Done.
> 
> Once again, considering you have the source at your disposal, I'm sure
> you could work something out.  Spackling in passive host checking is
> easier than some of the alternatives. :)


I'm not sure about that. Cue Ethan Galstad, Nagios' creator:

  "I am investigating the possibility of adding passive host checks in 
  2.0.  However, allowing passive checks opens a whole new can of worms 
  as far as host check logic is concerned.  For instance, if a host is 
  reported (passively) as being down (it was previously up), what 
  should happen with child hosts?  Should those be actively checked 
  according to the current tree traversal logic?  Also, host checks are 
  performed on-demand only (synchronously), so how do you handle 
  asynchronous results?  Host checks also get priority over pending 
  passive service check results, so that has to be figured out.

  Anyway, it isn't exactly trivial without changing a good portion of 
  how the host check logic works.  I'll be looking into it though..."

I don't think I want to dive that deep into Nagios just to make it do
something it really isn't designed to do.

> > Well, each metric could certainly come with default thresholds,
> > and if you use some inheritance mechanism you could rather easily
> > specify thresholds for all your cluster nodes:
> 
> In a per-node model you have to distribute the new config file to n
> nodes every time you change something.  Which is kind of a bummer,
> since (as I mentioned before) it seems that there's always an initial
> tweaking period with notifying mechanisms where you're changing the
> config every five minutes.

I'm not sure I see the point in distributing the threshold information.
As you said, the actual notifications will be issued from a central 
host, so why not just keep the threshold configuration there?

> > That way, you only need to specify any exceptions from the defaults.
> > Whooshy enough?
> 
> The mental image I was actually going for was the loading program from
> The Matrix, substituting endless streams of configuration directives
> for racks o' firearms...

Yes, obviously. So I showed how a few configuration lines (cut to
Tank, rapidly typing) could specify load thresholds for an entire
metacluster (WHOOSH). 8^)

> *That* stuff needs to be in gmetad (or a program that fills the same
> niche, querying one or more metadaemons or monitoring cores, chewing
> on the XML data, and doing something with it).  Flap thresholds,
> contact info, etc., etc., etc. ...
> 
> Sounds like you're volunteering to write it. :P

Here I was, hoping I could inspire someone else. 8^)

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------

Re: [Ganglia-general] Monitoring

Reply via email to