Hello,

We are facing an interesting but strange issue while trying to monitor
Oracle RAC services.

Oracle RAC is running on AIX 5.3 and nagios is running on Fedora Core 9.

The scripts we are using to monitor Oracle RAC services on AIX are as follows

-------------------------
$ cat check_oracle_services.sh

#!/usr/bin/ksh
# found on the Internet
RSC_KEY=$1

/oracle/crs_home/bin/crs_stat -u | awk \
        'BEGIN { FS="="; state = 0; } \
        $1~/NAME/ && $2~/'$RSC_KEY'/ {appname = $2; state=1}; \
        state == 0 {next;} \
        $1~/TARGET/ && state == 1 {apptarget = $2; state=2;} \
        $1~/STATE/ && state == 2 {appstate = $2; state=3;} \
        state == 3 {printf "%-45s %-18s\n", appname, appstate; state=0;}'
-------------------------

$ cat check_oracle_services.pl

#!/usr/bin/env perl

use strict;
use Getopt::Std;

my %return_value = (
        OK => 0,
        CRIT => 2,
        UNKNOWN => 3
);

my $message = "nagios";
my $exit_status;

my %opt=();
getopts("p:h", \%opt);

sub usage(){
        print "Usage: $0 -p service_name\n";
        exit $return_value{'UNKNOWN'};
}

usage() if defined $opt{'h'};

my $SERVICE = $opt{'p'} if defined $opt{'p'} || usage();

# the following code was added to make sure that nrpe was not getting confused
# with dotted argument
if ($SERVICE =~ "foo") {
        $SERVICE = "ora.foo.bar.inst";
}

my $PIPED = qx/ ksh check_oracle_services.sh $SERVICE/;
print $PIPED;

if ($PIPED =~ /OFFLINE/g) {
        $exit_status = $return_value{'CRIT'};
        $message = "Critical: $SERVICE is not running.";
} else {
        $exit_status = $return_value{'OK'};
        $message = "OK: $SERVICE is running.";
}

print "$message\n";
exit $exit_status;
-------------------------

When we try to run this script on AIX (local system) the output is as follows:

[srv01@/home/nagios/nrpe/libexec]$ perl check_oracle_services.pl -p foo
ora.foo.bar.inst                     OFFLINE
Critical: ora.foo.bar.inst is not running.

[srv01@/home/nagios/nrpe/libexec]$ perl check_oracle_services.pl -p
ora.foo.bar.inst
ora.foo.bar.inst                     OFFLINE
Critical: ora.foo.bar.inst is not running.

The service indeed is offline

[srv01@/home/nagios/nrpe/libexec]$ perl check_oracle_services.pl -p
ora.foodb.bardb1.inst
ora.foodb.bardb1.inst                         ONLINE on srv01
OK: ora.foodb.bardb1.inst is running.


Now when we try to run the same thing from nagios server it shows the
services are online even if they are not

[r...@nagios libexec]# ./check_nrpe -n -H 10.0.10.20 -c
check_oracle_services -a ora.foo.bar.inst
OK: ora.foo.bar.inst is running.

[r...@nagios libexec]# ./check_nrpe -n -H 10.0.10.20 -c
check_oracle_services -a foo
OK: ora.foo.bar.inst is running.

This is strange that we get the correct status when scripts are
executed locally but wrong status when the scripts are executed
remotely.

Has anyone faced a similar issue?  I would appreciate if someone could
give some insights on this.

Thanks

------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to