Hello,
I just hacked up a crm nagios plugin which works for me. It does not
check "crm_verify -LV" but I am going to add that. I don't like it very
much but it does a good job for me. Is there a way to get the
informations I currently check out of "cibadmin -o status -Q" or
something like that in a way that I don't have to do wild guessing on
the output of "crm_mon -1 -r"? The information I currently check for are
the following:

        - Are all nodes running?
        - Is there a DC?
        - Are all resources started?
        - Are there any orphaned resources?
        - Are there any failed actions?

I attached my "check_crm" nagios plugin. I also attached a OCF Resource
Agent that works perfectly on top of the tomcat init script which comes
with Debian Etch. So they're at least documented.

                Thomas
#!/usr/bin/perl -w

use strict;
use warnings FATAL => 'all';

my $exit_value = 0;
my @output = ();

my $NODES = 0;
my $NODES_ONLINE = 0;

my $RESOURCES = 0;
my $RESOURCES_ONLINE = 0;
my $RESOURCES_ORPHANED = 0;

my $ACTIONS_FAILED = 0;

my @input = `crm_mon -1 -r`;
chomp(@input);

sub
set_exit_value
{
        my $request = shift;
        if ($exit_value < $request) {
                $exit_value = $request;
        }
}

sub
check_dc
{
        if (grep(/Current DC:/, @input)) {
                push(@output, "DC choosen;");

        } else {
                push(@output, "No DC choosen;");
                set_exit_value(1);
        }
}

sub
check_nodes
{
        for my $node (grep(/^Node:/, @input)) {
                if ($node =~ /online$/) {
                        $NODES_ONLINE++;
                }
                $NODES++;
        }

        if ($NODES_ONLINE == 0) {
                set_exit_value(2);

        } elsif ($NODES != $NODES_ONLINE) {
                set_exit_value(1);
        }

        push(@output, "${NODES_ONLINE}/${NODES} nodes online;");
}

sub
check_ressources
{
        my $section;

        {
                my $text = join("\n", @input);
                $text =~ /Full list of resources:\n\n((\n|.)+)\n?\n?/g;
                $section = $1;
        }

        for my $line (split("\n", $section)) {
                if ($line =~ /Started /) {
                        $RESOURCES_ONLINE++;
                        $RESOURCES++;

                } elsif ($line =~ /Master /) {
                        $RESOURCES_ONLINE++;
                        $RESOURCES++;

                } elsif ($line =~ /Stopped/) {
                        $RESOURCES++;
                }

                if ($line =~ /ORPHANED/) {
                        $RESOURCES_ORPHANED++;
                }
        }

        if ($RESOURCES) {

                if ($RESOURCES_ONLINE == 0) {
                        set_exit_value(2);

                } elsif ($RESOURCES != $RESOURCES_ONLINE) {
                        set_exit_value(1);
                }

                push(@output, "${RESOURCES_ONLINE}/${RESOURCES} resources 
online;");
        }

        if ($RESOURCES_ORPHANED) {
                set_exit_value(1);
                push(@output, "${RESOURCES_ORPHANED} orphaned resources;");
        }
}

sub
check_for_failed_actions
{
        my @section = @input;

        shift(@section) while (defined($section[0]) && $section[0] !~ /^Failed 
actions:$/);


        for my $line (@section) {
                if ($line =~ /Error/) {
                        $ACTIONS_FAILED++;
                }
        }

        if (@section) {
                push(@output, "$ACTIONS_FAILED failed actions;");
                set_exit_value(1);
        }
}


check_dc();
check_nodes();
check_ressources();
check_for_failed_actions();

if ($exit_value == 0) {
        print "OK - ";

} elsif ($exit_value == 1) {
        print "WARNING - ";

} elsif ($exit_value == 2) {
        print "CRITICAL - ";

} else {
        print "UNKNOWN - ";
}

print join (" ", @output);
print "\n";

exit($exit_value);

__DATA__

============
Last updated: Wed Jan  2 09:11:40 2008
Current DC: postgres-01 (24a3fa1b-6b62-470c-a6e1-4c1598875018)
2 Nodes configured.
2 Resources configured.
============

Node: postgres-02 (211523e0-a549-49b7-bf29-f646915698ef): online
Node: postgres-01 (24a3fa1b-6b62-470c-a6e1-4c1598875018): online

Full list of resources:

Master/Slave Set: ms-drbd0
    drbd0:0     (heartbeat::ocf:drbd):  Stopped 
    drbd0:1     (heartbeat::ocf:drbd):  Stopped 
Resource Group: postgres-cluster
    fs0 (heartbeat::ocf:Filesystem):    Stopped 
    ip0 (heartbeat::ocf:IPaddr2):       Stopped 
    pgsql0      (heartbeat::ocf:pgsql): Stopped 

Failed actions:
    drbd0:0_start_0 (node=postgres-01, call=6, rc=1): Error
    drbd0:1_start_0 (node=postgres-01, call=9, rc=1): Error
    drbd0:0_start_0 (node=postgres-02, call=6, rc=1): Error
    drbd0:1_start_0 (node=postgres-02, call=9, rc=1): Error


============
Last updated: Mon Dec 31 13:52:22 2007
Current DC: tomcat-02 (e2607dae-3635-495e-b14f-f90f5dbb4a0e)
1 Nodes configured.
1 Resources configured.
============

Node: tomcat-02 (e2607dae-3635-495e-b14f-f90f5dbb4a0e): online

Full list of resources:

tomcat  (heartbeat::ocf:tomcattg):      Stopped
tomcat-02       (heartbeat::ocf:tomcattg ORPHANED):     Started tomcat-02


============
Last updated: Wed Jan  2 19:51:53 2008
Current DC: postgres-01 (24a3fa1b-6b62-470c-a6e1-4c1598875018)
2 Nodes configured.
2 Resources configured.
============

Node: postgres-02 (211523e0-a549-49b7-bf29-f646915698ef): online
Node: postgres-01 (24a3fa1b-6b62-470c-a6e1-4c1598875018): online

Full list of resources:

Master/Slave Set: ms-drbd0
    drbd0:0     (heartbeat::ocf:drbd):  Master postgres-02
    drbd0:1     (heartbeat::ocf:drbd):  Slave postgres-01
Resource Group: postgres-cluster
    fs0 (heartbeat::ocf:Filesystem):    Started postgres-02
    ip0 (heartbeat::ocf:IPaddr2):       Started postgres-02
    pgsql0      (heartbeat::ocf:pgsql): Started postgres-02
#!/bin/sh 

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# OCF Ressource Agent on top of tomcat init script shipped with debian. #
#                                  Thomas Glanzmann --tg 21:22 07-12-30 #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

#       This script manages a Heartbeat Tomcat instance
#       usage: $0 {start|stop|status|monitor|meta-data}
#       OCF exit codes are defined via ocf-shellfuncs 

. ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs

case  "$1" in
        start)
                /etc/init.d/tomcat5.5 start > /dev/null 2>&1 && exit || exit 1
        ;;

        stop)
                /etc/init.d/tomcat5.5 stop > /dev/null 2>&1 && exit || exit 1
        ;;

        status)
                /etc/init.d/tomcat5.5 status > /dev/null 2>&1 && exit || exit 1
        ;;

        monitor)
                # Check if Ressource is stopped
                /etc/init.d/tomcat5.5 status > /dev/null 2>&1 || exit 7

                # Otherwise check services (XXX: Maybe loosen retry / timeout)
                wget -o /dev/null -O /dev/null -T 1 -t 1 
http://localhost:8180/eccar/ && exit || exit 1
        ;;

        meta-data)
                cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="tomcattg">
<version>1.0</version>

<longdesc lang="en">
OCF Ressource Agent on top of tomcat init script shipped with debian.
</longdesc>

<shortdesc lang="en">OCF Ressource Agent on top of tomcat init script shipped 
with debian.</shortdesc>

<actions>
<action name="start"   timeout="90" />
<action name="stop"    timeout="100" />
<action name="status" timeout="60" />
<action name="monitor" depth="0" timeout="30s" interval="10s" start-delay="10s" 
/>
<action name="meta-data"  timeout="5s" />
<action name="validate-all"  timeout="20s" />
</actions>
</resource-agent>
END
        ;;
esac
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to