Hello, We are facing an interesting but strange issue while trying to monitor Oracle RAC services.
Oracle RAC is running on AIX 5.3 and nagios is running on Fedora Core 9. The scripts we are using to monitor Oracle RAC services on AIX are as follows ------------------------- $ cat check_oracle_services.sh #!/usr/bin/ksh # found on the Internet RSC_KEY=$1 /oracle/crs_home/bin/crs_stat -u | awk \ 'BEGIN { FS="="; state = 0; } \ $1~/NAME/ && $2~/'$RSC_KEY'/ {appname = $2; state=1}; \ state == 0 {next;} \ $1~/TARGET/ && state == 1 {apptarget = $2; state=2;} \ $1~/STATE/ && state == 2 {appstate = $2; state=3;} \ state == 3 {printf "%-45s %-18s\n", appname, appstate; state=0;}' ------------------------- $ cat check_oracle_services.pl #!/usr/bin/env perl use strict; use Getopt::Std; my %return_value = ( OK => 0, CRIT => 2, UNKNOWN => 3 ); my $message = "nagios"; my $exit_status; my %opt=(); getopts("p:h", \%opt); sub usage(){ print "Usage: $0 -p service_name\n"; exit $return_value{'UNKNOWN'}; } usage() if defined $opt{'h'}; my $SERVICE = $opt{'p'} if defined $opt{'p'} || usage(); # the following code was added to make sure that nrpe was not getting confused # with dotted argument if ($SERVICE =~ "foo") { $SERVICE = "ora.foo.bar.inst"; } my $PIPED = qx/ ksh check_oracle_services.sh $SERVICE/; print $PIPED; if ($PIPED =~ /OFFLINE/g) { $exit_status = $return_value{'CRIT'}; $message = "Critical: $SERVICE is not running."; } else { $exit_status = $return_value{'OK'}; $message = "OK: $SERVICE is running."; } print "$message\n"; exit $exit_status; ------------------------- When we try to run this script on AIX (local system) the output is as follows: [srv01@/home/nagios/nrpe/libexec]$ perl check_oracle_services.pl -p foo ora.foo.bar.inst OFFLINE Critical: ora.foo.bar.inst is not running. [srv01@/home/nagios/nrpe/libexec]$ perl check_oracle_services.pl -p ora.foo.bar.inst ora.foo.bar.inst OFFLINE Critical: ora.foo.bar.inst is not running. The service indeed is offline [srv01@/home/nagios/nrpe/libexec]$ perl check_oracle_services.pl -p ora.foodb.bardb1.inst ora.foodb.bardb1.inst ONLINE on srv01 OK: ora.foodb.bardb1.inst is running. Now when we try to run the same thing from nagios server it shows the services are online even if they are not [r...@nagios libexec]# ./check_nrpe -n -H 10.0.10.20 -c check_oracle_services -a ora.foo.bar.inst OK: ora.foo.bar.inst is running. [r...@nagios libexec]# ./check_nrpe -n -H 10.0.10.20 -c check_oracle_services -a foo OK: ora.foo.bar.inst is running. This is strange that we get the correct status when scripts are executed locally but wrong status when the scripts are executed remotely. Has anyone faced a similar issue? I would appreciate if someone could give some insights on this. Thanks ------------------------------------------------------------------------------ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null