Here's my config. It's functional: define command{ command_name check-cluster-health command_line /usr/lib/nagios/plugins/check_cluster --service -l $ARG1$ -w $ARG2$ -c $ARG3$ -d $ARG4$ }
define service{ service_description check-cluster-health host app-proxy check_command check-cluster-health!"App Thread Health"!0!1!$SERVICESTATEID:app-1:mongrel-count$,$SERVICESTATEID:app-2:mongrel-count$,$SERVICESTATEID:app-3:mongrel-count$,$SERVICESTATEID:app-4:mongrel-count$ use serviceClusterTemplate } define service{ service_description mongrel-count hostgroup app-servers,manager-servers check_command check_nrpe_1arg!check_mongrel_count notifications_enabled 0 use serviceClusterTemplate } -lee On Tue, Feb 24, 2009 at 5:18 PM, Chris Beattie <cbeat...@geninfo.com> wrote: > I need some help understanding the check_cluster plugin, please. I’m using > version 1.4.13 of the plugins on Nagios 3.10, all compiled from source on > 64-bit CentOS 5.2. We use VMWare ESX clusters, and I’d like the hosts in > Nagios that happen to be virtual machines to have one parent instead of a > list of parents comprising every ESX host in the cluster. Recently, an ESX > host was moved from one cluster to another, so I had to change a lot of > parents. If there’s a better way to represent VMs and their hosts, I’m open > to suggestions too. > > > > I don’t have any problem running it as the Nagios user from the command line > and feeding it states, like so: > > ./check_cluster --host --data=0,0,2,1 --warning=0 --critical=1 > > CLUSTER CRITICAL: Host cluster: 2 up, 1 down, 1 unreachable > > ./check_cluster --host --data=0,0,0,0 --warning=0 --critical=1 > > CLUSTER OK: Host cluster: 4 up, 0 down, 0 unreachable > > ./check_cluster --host --data=0,0,0,1 --warning=0 --critical=1 > > CLUSTER WARNING: Host cluster: 3 up, 1 down, 0 unreachable > > > > Adding --verbose just says “check_cluster - Warning: start=0 end=0; > Critical: start=0 end=1” first. > > > > However, if I try anything with the $HOSTSTATEID$ macro, everything is > always OK, even if I just make up host names: > > [./check_cluster --host > --data=$HOSTSTATEID:duck$,$HOSTSTATEID:cow$,$HOSTSTATEID:chicken$ > --warning=0 --critical=1 > > CLUSTER OK: Host cluster: 3 up, 0 down, 0 unreachable > > > > I thought maybe macros work better when executed by Nagios, so I added > check_host_cluster command a host with that as its check_command. > > define command { > > command_name check_host_cluster > > command_line $USER1$/check_cluster --host --label=$HOSTNAME$ > --warning=$ARG1$ --critical=$ARG2$ --data=$ARG3$ > > } > > > > define host { > > use linux-server > > host_name ProductionCluster1 > > alias Production Cluster 1 > > address 127.0.0.1 > > parents gisesx1,gisesx3,gisesx4 > > check_command > check_host_cluster!1!2!$HOSTSTATEID:foo1$,$HOSTSTATEID:foo3$,$HOSTSTATEID:foo4$ > > hostgroups nogsupport > > } > > > > The check_interval for the linux-server template is set to 3. I made the > assumption that it didn’t matter what I set the address to since I’m only > interested in the state of other hosts, and it’s not being referenced in the > check_command. > > > > It shows up in the host information web page as being up, but I don’t have > any hosts named foo: > > Host Status: > > UP > > (for 0d 3h 41m 9s+) > > Status Information: CLUSTER OK: ProductionCluster1: 3 up, 0 down, 0 > unreachable > > > > I had better luck with check_icmp, but it looks like it goes straight to > CRITICAL if one host is down. > > This message (including any attachments) is intended only for > the use of the individual or entity to which it is addressed and > may contain information that is non-public, proprietary, > privileged, confidential, and exempt from disclosure under > applicable law or may constitute as attorney work product. > If you are not the intended recipient, you are hereby notified > that any use, dissemination, distribution, or copying of this > communication is strictly prohibited. If you have received this > communication in error, notify us immediately by telephone and > (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. > > Thank you. > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null