Dear linux-ha subscribers,

we run an HB 2-node active-standby cluster, still with the legacy HB1 
configuration.

The two nodes consist of 2 RHEL 5.1 x86_64 systems.

The used HB version is

# /usr/lib64/heartbeat/heartbeat -V
2.1.3

Before the OS upgrade of the nodes they used to run Fedora 3 i386 (the HB 
release I cannot remember).

I used to run a simple Nagios plugin script that I wrote,
which merely invokes the "heartbeat -s" command and has been issued through 
nrpe,
just to get alerted if heartbeat for whatever reason isn't running on one of 
the nodes
(which happened in the past, when a failover didn't take place as it should)
e.g.

# grep check_heartbeat /etc/nagios/nrpe.cfg 
command[check_heartbeat]=/usr/lib64/nagios/plugins/custom/check_heartbeat.sh

This worked fine for the old OS (and probably old HB version that was used 
then).

After I had successfully upgraded this cluster to the new OS 
I was wondering, why my Nagios plugin always returned CRITICAL states
though heartbeat was running on the node at the time.
Then I discovered that the output of my check command differed decisively 
depending on who executed the check.

e.g. as root I get

# /usr/lib64/nagios/plugins/custom/check_heartbeat.sh
OK - heartbeat is running on nodeA

or rather what really gets executed in that plugin and whose output merely gets 
parsed is

# /usr/lib64/heartbeat/heartbeat -s
heartbeat OK [pid 31017 et al] is running on nodeA [nodeA]...


# pgrep -P1 -fl heartbeat
31017 heartbeat: master control process


But when run as an unprivileged user, as is the case when the nrpe daemon is 
executing the check,
oops, I get this strange result

# /usr/lib64/nagios/plugins/check_nrpe -n -H localhost -c check_heartbeat
CRITICAL - heartbeat is stopped on nodeA
 
# echo $?
2

or, because nrpe is running as this user in reality

# runuser -s /bin/sh -l -c '/usr/lib64/heartbeat/heartbeat -s' munin
heartbeat is stopped. No process


How come, is this a bug or intended behavior?

I wonder if it then wasn't wiser for my simple Nagios plugin to just do 
something similar to this?

# pid=$(pgrep -P1 heartbeat) && printf "OK - Heartbeat (PID=%s) running\n" $pid
OK - Heartbeat (PID=31017) running


Regards


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to