Jay R. Ashworth wrote: >>> On Wed, Jun 11, 2008 at 05:54:48PM +0200, Andreas Ericsson wrote: >>>>> Well, since (to take an example), CRITICAL load means "a loadaverage >>>>> over 8" (on my 8-core Opteron), and we don't *know* the load average if >>>>> the machine isn't reachable to return a value... then the nrpe checker >>>>> on the console in fact *is* getting an IO error when trying to, ok, >>>>> read from a network socket. >>>> I was more thinking along the lines of errno being set to EIO when >>>> attempting to read(2) from an already connected network socket, although >>>> there are two schools about that too (some wants all failures to always >>>> alert, while some wants a lot of things to be in UNKNOWN state). >>>> >>>> Not being able to connect clearly signals there is something wrong >>>> with the service though, while an EIO signals that there's something >>>> wrong with the Nagios hosts' kernel or hardware. >>> My problem with that is that not all of what Nagios monitors is >>> "services", in the meaning we usually give to that term. Much of it is >>> "attributes" -- load average and diskspace on a machine being great >>> examples. >> True that, but the service of storing a file on disk (or, for some >> retarded filesystems, reading one from a disk) requires there to be a >> minimum of free space available. It's what makes up the platform on >> which the *real* services rest. Hence servicegroups (which together >> make up what a service-provider would call a service). > > Could you expand on that? Do you mean to imply that a good use for a > servicegroup is "all the physical services upon which my public website > rests", as I think I read in your reply there? >
Yes, that's what I mean. Groups are first and foremost a visual aid (never mind configuration, as that can be scripted). Having some random point-and-click monkey on duty watching the servicegroup summary will give you a quick warning of what the users will claim has broken down when they call to complain. >>> IMHO, anything you're trying to monitor that's actually a "service" -- >>> IE: a public facing website -- shouldn't be directly attached to a host, >>> anyway... >>> >>> What if you're Google? Which host do you attach "http://www.google.com" >>> to? >> All the query distributors (google works by having several front-end servers >> distributing the incoming queries to quite a large army of query responders, >> which have access to the gdfs (google distributed filesystem) for doing the >> actual lookups). Since a monitoring tool is only worth something if it tells >> you *where* things break rather than only that things are broken, that >> makes perfect sense for a monitoring system even if that's not the case for >> the service provider or its sales people. > > Ok, Google was a poor choice. > > Conversely, though, there may be cases where... or maybe there aren't. > let me muse on this some more. > Muse away. :) I'm fairly convinced the net op team will still demand a system that shows them where the problem is though, while the it support team just want to know what to say when the customers/users/whatever calls in and claim "the mail isn't working" (servicegroups help there). -- Andreas Ericsson [EMAIL PROTECTED] OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
