Dear linux-ha subscribers, we run an HB 2-node active-standby cluster, still with the legacy HB1 configuration.
The two nodes consist of 2 RHEL 5.1 x86_64 systems. The used HB version is # /usr/lib64/heartbeat/heartbeat -V 2.1.3 Before the OS upgrade of the nodes they used to run Fedora 3 i386 (the HB release I cannot remember). I used to run a simple Nagios plugin script that I wrote, which merely invokes the "heartbeat -s" command and has been issued through nrpe, just to get alerted if heartbeat for whatever reason isn't running on one of the nodes (which happened in the past, when a failover didn't take place as it should) e.g. # grep check_heartbeat /etc/nagios/nrpe.cfg command[check_heartbeat]=/usr/lib64/nagios/plugins/custom/check_heartbeat.sh This worked fine for the old OS (and probably old HB version that was used then). After I had successfully upgraded this cluster to the new OS I was wondering, why my Nagios plugin always returned CRITICAL states though heartbeat was running on the node at the time. Then I discovered that the output of my check command differed decisively depending on who executed the check. e.g. as root I get # /usr/lib64/nagios/plugins/custom/check_heartbeat.sh OK - heartbeat is running on nodeA or rather what really gets executed in that plugin and whose output merely gets parsed is # /usr/lib64/heartbeat/heartbeat -s heartbeat OK [pid 31017 et al] is running on nodeA [nodeA]... # pgrep -P1 -fl heartbeat 31017 heartbeat: master control process But when run as an unprivileged user, as is the case when the nrpe daemon is executing the check, oops, I get this strange result # /usr/lib64/nagios/plugins/check_nrpe -n -H localhost -c check_heartbeat CRITICAL - heartbeat is stopped on nodeA # echo $? 2 or, because nrpe is running as this user in reality # runuser -s /bin/sh -l -c '/usr/lib64/heartbeat/heartbeat -s' munin heartbeat is stopped. No process How come, is this a bug or intended behavior? I wonder if it then wasn't wiser for my simple Nagios plugin to just do something similar to this? # pid=$(pgrep -P1 heartbeat) && printf "OK - Heartbeat (PID=%s) running\n" $pid OK - Heartbeat (PID=31017) running Regards _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
