On Dec 8, 2008, at 11:38 AM, Toussaint OTTAVI wrote: > Hi list, > > I've been investigating this problem for a while, but I couldn't > find a good solution. > > * Example situation : > Assume I have one host with 20 service checks. > > * Problem : > If the host becomes DOWN, Nagios still continues to do service > checks on this host. So, after a while, all the services will go to > a CRITICAL state. Then, in my console, I will see : > - 1 Host down, > - 20 Services down > This information is not pertinent. The only information I would see > in such a case is the "host down". The 20 "service down" > informations are obvious, and generate a "visual pollution" that may > prevent to easily identify the problem.
Nagios is first and foremost a service monitor, not a host monitor. Host monitoring is only necessary, as far as nagios is concerned, for two reasons -- - notification supression. If the host is down, don't notify about the services. They're still down so show them down, but don't wake anybody up over it if they're not also responsible for the host. - parenting/unreachable logic. Nagios is designed to show the current state of services as accurately as possible. This helps explain the 'why' of the behavior you are seeing and works very well to cover the edge cases that your goal won't catch. For example, if your host check is a ping and something borks ICMP on your network, you would have all the services on that host disabled and set to unknown, even though they are working just fine. Your understanding of exactly what is impacted on that host is now completely wrong. By artificially changing the service state, your reporting is no longer reliable as well. You may be fine with that but understand that your goal is opposite of what nagios is meant to do. > * Expected behavior : > When a host is down, I would like to : > - See only one thing in red in the console : 1 HOST DOWN > - Disabling all the service checks (which at this point do not have > any chance of success) > - Put the service into "UNKNOWN" status This kind of methodology is just about opposite of what nagios is designed to do. While you may be able to do it with creative event handlers and modifications to your notification scripts, it's a square- box-in-round-hole task. Instead of disabling the service checks, you may be able to use adaptive monitoring to change the service check_commands to something that always returns UNKNOWN (i.e. check_dummy). This of course assumes that you are using regularly scheduled host checks otherwise nagios would never check your host state again and that you're able to glean what the current check_command is for each service. When the host recovered, change the check_command back to whatever it was before for each service. -- Marc ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null