Hello all, was wondering if anybody here could test this NetApp script for me. Basically you need a (test) NetApp that you can afford to yank a drive out and a bring the array to a degraded state. Then you can put it back in and let it start building again. If you configure an alert and upalert in mon, you should get a notice (whatever type you use) when it detects a degraded array and when it detects it's finished reconstructing. However, it won't give a notice when it changes from "degraded" to "reconstruct" status.
This is basically the netappfree.monitor script converted to use volTable items instead of dfTable items. I also added 'use strict' to the new script and a --version option (my NetApps want to force version one by my net-snmp libs default to version 3). Here's a sample output: admin51 alert.d # perl ../mon.d/netappraidstat.monitor --list filer1 filer2 filer ONTAP Volume Name Vol State Vol Status ------------------------------------------------------------------------- filer1 6.5 vol0 online raid4 filer2 6.5 vol0 online raid4 It is seemingly working fine (indicates all ok) on our production system here, but I don't have a test system that I can yank a drive out of to test. Can anybody here do so? I would be most appreciative if someone could verify that it detects "degraded" and "reconstruct" status and triggers the defined alert. I've considered adding a feature to check that a (any) drive has failed and that it has used the spare. But during the reconstruct phase, you should get an alert anyway, so that seems to be overkill. It also seems like it should check for raid_dp conditions, but we don't have raid_dp set on any machines, so can't check that yet either (we plan to do that upgrade soon.) The netappraidstat.monitor is quoted inline in this email instead of attached via MIME (I recall that this ML does mime stripping by default). #!/usr/bin/perl # # Use SNMP to get raid status from a Network Appliance # exits with value of 1 if an array has status "degraded" or # "reconstruct", or exits with the value of 2 if there is a # "soft" error (SNMP library error, or could not get a # response from the server). # # This requires the UCD SNMP library and G.S. Marzot's Perl SNMP # module. # # Borrowed heavily from framework of netappfree.monitor. # Originally by Jim Trocki. Modified by Theo Van Dinter # ([EMAIL PROTECTED], [EMAIL PROTECTED]) to add verbose error output, # more error checking, etc. Can be used in conjunction with # snapdelete.alert to auto-remove snapshots if needed. # # $Id:$ # # Copyright (C) 1998, Jim Trocki # Copyright (C) 1999-2001, Theo Van Dinter # Copyright (C) 2005, Todd Lyons # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use strict; use SNMP; use Getopt::Long; sub list; sub readcf; $ENV{"MIBS"} = 'RFC1213-MIB:NETWORK-APPLIANCE-MIB'; my %opt; GetOptions (\%opt, "community=s", "timeout=i", "retries=i", "config=s", "version=s", "list"); die "no host arguments\n" if (@ARGV == 0); my $RET = 0; my @ERRS = (); my %HOSTS = (); my %ARRAY = (); # disk array names my ($s, $v); # handles to snmp objects my ($listhost, $ver); # hostname and version of ONTap my $COMM = $opt{"community"} || "public"; my $TIMEOUT = $opt{"timeout"} * 1000 * 1000 || 2000000; my $RETRIES = $opt{"retries"} || 5; # Reading the config file is very liberal, reads first argument, ignores # the rest. Allows you to symlink to existing netappfree.cf (no need to # keep seperate config files). my $CONFIG = $opt{"config"} || (-d "/etc/mon" ? "/etc/mon" : "/usr/lib/mon/etc") . "/netappraidstat.cf"; my $VERSION = $opt{"version"} || 1; list (@ARGV) if ($opt{"list"}); my ($volIndex, $volName, $volState, $volStatus) = (0..3); readcf ($CONFIG) || die "could not read config: $!\n"; foreach my $host (@ARGV) { next if (!defined $ARRAY{$host}); $s; if (!defined($s = new SNMP::Session (DestHost => $host, Timeout => $TIMEOUT, Community => $COMM, Retries => $RETRIES, Version => $VERSION))) { $RET = ($RET == 1) ? 1 : 2; $HOSTS{$host} ++; push (@ERRS, "could not create session to $host: " . $SNMP::Session::ErrorStr); next; } $v = new SNMP::VarList ( ['volIndex'], ['volName'], ['volState'], ['volStatus'], ); if ( $v->[$volIndex]->tag !~ /^vol/ ) { push(@ERRS,"OIDs not mapping correctly! Check that NetApp MIB is available!"); $RET = 1; last; } while (defined $s->getnext($v)) { last if ($v->[$volIndex]->tag !~ /volIndex/); if ($v->[$volStatus]->val =~ /degraded|reconstruct/ ) { $HOSTS{$host}++; push (@ERRS, sprintf ("%s is %s, status: '%s'", $host, $v->[$volState]->val, $v->[$volStatus]->val) ); $RET = 1; } } if ($s->{ErrorNum}) { $HOSTS{$host} ++; push (@ERRS, "could not get volIndex for $host: " . $s->{ErrorStr}); $RET = ($RET == 1) ? 1 : 2; } } if ($RET) { print join(" ", sort keys %HOSTS), "\n\n", join("\n", @ERRS), "\n"; } exit $RET; # # read configuration file # sub readcf { my ($f) = @_; my ($l, $host, $dummy); open (CF, $f) || return undef; while (<CF>) { next if (/^\s*#/ || /^\s*$/); chomp; ($host, $dummy) = split; if (!defined ($ARRAY{$host} = $host)) { die "error, cannot extract hostname, config $f, line $.\n"; } } close (CF); } # # Don't use config, instead just dump all data returned from netapp # sub list { my (@hosts) = @_; foreach my $host (@hosts) { if (!defined($s = new SNMP::Session (DestHost => $host, Timeout => $TIMEOUT, Community => $COMM, Retries => $RETRIES, Version => $VERSION))) { print STDERR "could not create session to $host: " . $SNMP::Session::ErrorStr, "\n"; next; } $listhost = $host; # Handles global scope in --list mode $ver = $s->get(['sysDescr', 0]); $ver =~ s/^netapp.*release\s*([^:]+):.*$/$1/i; $v = new SNMP::VarList ( ['volIndex'], ['volName'], ['volState'], ['volStatus'], ); while (defined $s->getnext($v)) { last if ($v->[$volIndex]->tag !~ /volIndex/); write; } } exit 0; } format STDOUT_TOP = filer ONTAP Volume Name Vol State Vol Status ------------------------------------------------------------------------- . format STDOUT = @<<<<<<<<<<<<<< @<<<<<<<<< @<<<<<<<< @>>>>>>>>>> @>>>>>>>>>>>>>>>>>>> $listhost, $ver, $v->[1]->[2], $v->[2]->[2], $v->[3]->[2] . -- Regards... Todd OS X: We've been fighting the "It's a mac" syndrome with upper management for years now. Lately we've taken to just referring to new mac installations as "Unix" installations when presenting proposals and updates. For some reason, they have no problem with that. -- /. Linux kernel 2.6.12-12mdksmp 2 users, load average: 0.02, 0.05, 0.07 _______________________________________________ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon