On Tue, Jan 11, 2005 at 08:56:49PM -0500, Ed Ravin wrote:
> Has anyone hacked any of the disk space monitors to monitor inode
> consumption? [...]

Oh well, I had to do my own hacking.  Along the way, I fixed a few
things and added a few new features.  If you're using this monitor,
I encourage you to try this version and let me know how it works
for you.  Even if you don't need the inode monitoring feature, you
will probably like the new "--listall" feature that shows exactly
which filesystems are monitored and what the thresholds are for
that filesystem (i.e. a good way to debug your config file).

To monitor inodes, you need a recent net-snmp (I'm using 5.2.1.rc2,
5.2.1.rc3 just came out and should be just as good), you need to add
the line "includeAllDisks" into snmpd.conf (or manually add a "disk"
entry for each filesystem you want monitored, but who's got time for
that?), and you need to run this monitor with the "--usemib ucd" option.
To change the inode monitoring threshold (default 5%), add a 4th column
into the config file as needed.

Summary of changes:

New stuff:

* monitors inode usage (with "--usemib ucd" option and 4th column in config
file)

* choose which MIB you want to use ("--usemib" option)

* list out all monitored filesystems, with the parameters for alarming.
This shows you exactly what the thresholds are for each filesystem.

* debug output ("--debug" option)

Changed stuff:

* all failure messages now prefixed with the hostname as per normal Mon
style.  Also, failure messages now also display the threshold so you
can tell from the failure message what level you were testing for.

* when using the UCD MIB, no longer fails if the agent has a "sparse MIB"

* recognize more devices as local disk in UCD MIB.  Bug: this should be
a configurable option.

* added --ifree option to set default inode threshold if needed.

The new snmpdiskspace.monitor and .cf are attached.

        -- Ed


#!/usr/local/bin/perl
#
# NAME
#  snmpdiskspace.monitor
#
#
# SYNOPSIS
#  snmpdiskspace.monitor [--list] [--timeout seconds] [--config filename]
#                        [--community string] [--free minfree] 
#                        [--retries retries] [--usemib <mibtype>] host...
#
#
# DESCRIPTION
#  This script uses the Host Resources MIB (RFC1514), and optionally
#  the MS Windows NT Performance MIB, or UCD-SNMP extensions
#  (enterprises.ucdavis.dskTable.dskEntry) to monitor diskspace on hosts
#  via SNMP.
#
#  snmpdiskspace.monitor uses a config file to allow the specification of
#  minimum free space on a per-host and per-partition basis. The config 
#  file allows the use of regular expressions, so it is quite flexible in
#  what it can allow. See the sample config file for more details and
#  syntax.
#
#  The script only checks disks marked as "FixedDisks" by the Host MIB,
#  which should help cut down on the number of CD-ROM drives 
#  erroneously reported as being full! Since the drive classification
#  portion of the UCD Host MIB isn't too great on many OS'es, though,
#  this won't buy you a lot. Empire's SNMP agent gets this right on
#  all the hosts that I checked, though. Not sure about the MS MIB.
#  UCD-SNMP only checks specific partition types (md, hd, sd, ida)
# 
#  snmpdiskspace.monitor is intended for use as a monitor for the mon
#  network monitoring package.
#
#
# OPTIONS
#  --community   The SNMP community string to use. Default is "public".
#  --config      The config file to use. Default is either 
#                /etc/mon/snmpdiskspace.cf or 
#                /usr/lib/mon/mon.d/snmpdiskspace.cf, in that order.
#  --retries     The number of retries to use, if we get an SNMP timeout.
#                Default is retry 5 times.
#  --timeout     Seconds to wait before declaring a timeout on an SNMP get.
#                Default is 20 seconds.
#  --free        The default minimum free space, in a percentage or absolute
#                quantity, as per the config file. Thus, arguments of, for
#                example, "20%", "1gb", "50mb" are all valid.
#                Default is 5% free on every partition checked.
#
#  --ifree       The default minimum free inode percentage, specified as
#                a percentage.  Default is 5% free.
#
#  --list        Give a verbose listing of all partitions checked on all 
#                specified hosts.
#
#  --listall     like --list, but also lists the thresholds defined for
#                each filesystem, so you can doublecheck the config file
#
#  --usemib      Choose which MIB to use: one or more of host, perf, ucd
#                Default tries all three, in that order
#
#  --debug       enable debug output for config file parsing and MIB fetching
#
#
# EXIT STATUS
#  Exit status is as follows:
#    0     No problems detected.
#    1     Free space on any host was below the supplied parameter.
#    2     A "soft" error occurred, either a SNMP library error, 
#          or could not get a response from the server. 
#
#  In the case where both a soft error and a freespace violation are
#  detected, exit status is 1.
#
# BUGS
# When using the net-snmp agent, you must build it with "--with-dummy-values"
# or the monitor may not parse the Host Resources MIB properly.
#
# List of local filesystem types used when parsing the UCD MIB should be
# configurable.
#
#
# NOTES
# $Id: snmpdiskspace.monitor,v 1.5 2005/01/13 23:40:35 root Exp root $
#
#  * Added support for inode status via UCD-SNMP MIB.  Fourth column in config
#  file (optional) is for inode%.
#  * added --debug and --usemib options.  Latter needed so you can force use
#  of UCD mib if you want inode status.
#  * rearranged the error messages to be more Mon-like (hostname first)
#  * added code to synchronize instance numbers when using UCD MIB.  This
#  could solve the "sparse MIB" problem usually fixed by the
#  --with-dummy-values option in net-snmp if needed for other agents
#  Ed Ravin ([EMAIL PROTECTED]), January 2005
#
#  Added support for regex hostnames and partition names in the config file,
#  'use strict' by andrew ryan <[EMAIL PROTECTED]>.
#
#  Generalised to handle multible mibs by jens persson <[EMAIL PROTECTED]>
#  Changes Copyright (C) 2000, jens persson
#
#  Modified for use with UCD-SNMP by Johannes Walch for 
#  NWE GmbH ([EMAIL PROTECTED])
#
#  Support for UCD's disk MIB added by Matt Simonsen <[EMAIL PROTECTED]>
#
#
# SEE ALSO
#  mon: http://www.kernel.org/software/mon/
#
#  This requires the UCD SNMP library and G.S. Marzot's Perl SNMP
#  module. (http://ucd-snmp.ucdavis.edu and CPAN, respectively).
#
#  The Empire SystemEdge SNMP agent: http://www.empire.com
#
#
# COPYRIGHT
#
#    Copyright (C) 1998, Jim Trocki
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation; either version 2 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program; if not, write to the Free Software
#    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
use strict;
use SNMP;
use Getopt::Long;

sub readcf;
sub toBytes;
sub get_values;

# setup what mibs to use
# $ENV{"MIBS"} = 
'RFC1213-MIB:HOST-RESOURCES-MIB:WINDOWS-NT-PERFORMANCE:UCD-SNMP-MIB';
$ENV{"MIBS"} = 'RFC1213-MIB:HOST-RESOURCES-MIB:UCD-SNMP-MIB';

my %opt;

# parse the commandline
GetOptions (\%opt, "community=s", "timeout=i", "retries=i", "config=s", "list", 
"listall", "free=i", "ifree=n", "usemib=s", "debug");

die "No host arguments given!\n" if (@ARGV == 0);

my $RET = 0;   #exit value of script
my @ERRS = (); # array holding detail output
my @HOSTS = ();  # array holding summary output
my @cfgfile = ();  #array holding contents of config file


# Read in defaults
my $COMM                = $opt{"community"} || $ENV{"COMMUNITY"} || "public";
my $TIMEOUT     = $opt{"timeout"} * 100000 || 2000000;   #default timeout is 20 
seconds
my $RETRIES     = $opt{"retries"} || 5;
my $CONFIG      = $opt{"config"} || (-d "/etc/mon" ? "/etc/mon" : 
"/usr/lib/mon/mon.d")
    . "/snmpdiskspace.cf";
my $DISKFREE  = $opt{"free"}  || -5;   #default max % full is 95%
my $INODEFREE = $opt{"ifree"} ||  5;   #default max % inode full is 95%
my $USEMIB= $opt{"usemib"} || "host perf ucd";

my $LIST= $opt{"list"} || $opt{"listall"} || 0;
my $LISTALL= $opt{"listall"} || 0;
my $DEBUG= $opt{"debug"} || 0;


my ($host, $checkval, $icheckval, %FREE, $disk, @disklist, $cfgline);

# read the config file
if ( !readcf ($CONFIG) ) {
    # not being able to read config file shouldn't be a fatal, since we
    # have defaults we can use.
    print STDERR "readcf: Could not read config file $CONFIG: $!\n";
}


# now do the checks for each host
foreach $host (@ARGV) {

    # fetch the info from the computers
    @disklist = get_values($host);
    next unless (@disklist) && (ref($disklist[0]) eq "ARRAY");    #make sure we 
got an OK return value from get_values before going any further

    # Now check each partition 
    foreach $disk (@disklist) {
        undef $checkval ;
        undef $icheckval ;

        # Go through the config file line by line until we 
        # find a match for this host/partition. Stop as soon
        # as we find a match.
        foreach $cfgline (@cfgfile) {
            if ( ($host =~ m/^$cfgline->[0]$/) &&
                 ($disk->[2] =~ m/^$cfgline->[1]$/)
                 ) {
                print STDERR "'$host' matched /^$cfgline->[0]\$/ or 
'$disk->[2]' matched /^$cfgline->[1]\$/, using checkval $cfgline->[2]\n" if 
$DEBUG;
                $checkval = $cfgline->[2] ;
                $icheckval= $cfgline->[3] ;
                last;
            }
        }

        # Set to default otherwise
        $checkval = $DISKFREE unless defined($checkval);
        $icheckval= $INODEFREE unless defined($icheckval);
        $icheckval=~ s/%$//;

        # do the checking, first absolute and then percentage
        next if $checkval == 0 && $icheckval == 0;  # nothing to check: ignore
        my $hostfailed= 0;
        if (($checkval > 0) && ($disk->[0] <$checkval)) {
                $hostfailed++;
                push (@ERRS,sprintf("%s: filesystem %s is (%1.1f%% full), 
%1.0fMB free (below threshold %1.0fMB free)",
                    $host , $disk->[2] , $disk->[1] , $disk->[0] / 1048576, 
$checkval / 1048576 ));
        } elsif (($checkval < 0) && ($disk->[1] - $checkval >=100)) {
                $hostfailed++;
                push (@ERRS,sprintf("%s: filesystem %s is (%1.1f%% full), 
%1.0fMB free (below threshold %s%% free)",
                    $host , $disk->[2] , $disk->[1] , $disk->[0] / 1048576, 
abs($checkval) ));
        }

        if (($icheckval > 0) && ($disk->[3] ne "N/A") && (100 - $disk->[3]) < 
$icheckval ) {
                $hostfailed++;
                push (@ERRS, sprintf ("%s: filesystem %s has %1.1f%% inodes 
free (below threshold %s%% inodes free)",
                $host, $disk->[2], 100 - $disk->[3], $icheckval ));
        }
                
        if ($hostfailed)
        {
                push (@HOSTS, $host);
                $RET = 1;
        }

        # if the user want a listing, then the user will get a listing :-)
            write if ($LIST or $LISTALL);
            if ($LISTALL) {
            printf("   Will alarm if MB free declines below threshold %1.0fMB 
free\n", $checkval / 1048576) if $checkval > 0;
            printf("   Will alarm if %%free space declines below threshold 
%1.1f%% free\n", abs($checkval)) if $checkval < 0;
            printf("   No free space alarm defined in config file.\n") if 
$checkval == 0;
            printf("   Will alarm if %%free inodes declines below %1.1f%%\n", 
$icheckval) if $icheckval > 0;
            printf("   No %%inodes free alarm defined in config file.\n") if 
$icheckval == 0;
            printf("   WARNING: Unable to alarm on inodes free, dskPercentNode 
not found in MIB\n") if $disk->[3] eq "N/A" and $icheckval > 0;
           }
        }
}

if ($LIST or $LISTALL) {
    print "\n\n";
}

# Uniq the array of failures, so multiple failures on a single host
# are reported in the details section (lines #2-infinity) but not
# in the summary (line #1).
# Then print out the failures, if any.
my %saw;
undef %saw;
@[EMAIL PROTECTED] = ();
@HOSTS = keys %saw;


if ($RET) {
    print "@HOSTS\n";
    print "\n";
    print join("\n", @ERRS), "\n";
}

exit $RET;



#
# read configuration file
#
sub readcf {
    my ($f) = @_;
    my ($l, $host, $filesys, $free, $ifree);

    open (CF, $f) || return undef;
    while (<CF>) {
        next if (/^\s*#/ || /^\s*$/);
        chomp;
        ($host, $filesys, $free, $ifree) = split;
#       if (!defined ($FREE{$host}{$filesys} = toBytes ($free))) {
                 if (!push (@cfgfile, [$host , $filesys , toBytes ($free), 
$ifree || 0]) ) {
            die "error free specification, config $f, line $.\n";
        }
                 print STDERR "cf: assigned host=$host, filesys=$filesys, 
free=$free, ifree=$ifree\n" if $DEBUG;
    }
    close (CF);
}



sub toBytes {
    # take a string and parse it as folows
    # N      return N
    # N kb   return N*1024
    # N mb   return N*1024^2
    # N gb   return N*1024^3
    # N %    return -N
    my ($free) = @_;
    my ($n, $u);

    if ($free =~ /^(\d+\.\d+)(kb|mb|gb|%|)$/i) {
        ($n, $u) = ($1, "\L$2");
    } elsif ($free =~ /^(\d+)(kb|mb|gb|%|)$/i) {
        ($n, $u) = ($1, "\L$2");
    } else {
        return undef;
    }

    return (int ($n * -1)) if ($u eq "%");
    return (int ($n * 1024 )) if ($u eq "kb");
    return (int ($n * 1024 * 1024)) if ($u eq "mb");
    return (int ($n * 1024 * 1024 * 1024)) if ($u eq "gb");
    int ($n);
}


#
# Do the work of trying to get the data from the host via SNMP
#
sub get_values {
    my ($host) = @_;

    my 
(@disklist,$Type,$Descr,$AllocationUnits,$Size,$Used,$Freespace,$Percent,$InodePercent);
    my ($v,$s);
    
    
    if (!defined($s = new SNMP::Session (DestHost => $host,
                Timeout => $TIMEOUT, Community => $COMM,
                Retries => $RETRIES))) {
        $RET = ($RET == 1) ? 1 : 2 ;
        push (@HOSTS, $host);
        push (@ERRS, "$host: could not create session: " . $s->{ErrorStr});
        return undef;
    }

    # First we try to use the Host mib (RFC1514)
    # supported by net-snmpd on most platforms, see http://www.net-snmp.org
    #
    # You can also use the Empire (http://www.empire.com) 
    # SNMP agent to provide hostmib support on UNIX and NT.

    if ($USEMIB =~ /host/i)
    {

            $v = new SNMP::VarList (
                ['hrStorageIndex'],
                ['hrStorageType'],
                ['hrStorageDescr'],
                ['hrStorageAllocationUnits'],
                ['hrStorageSize'],
                ['hrStorageUsed'],
            );


            while (defined $s->getnext($v)) {
                last if ($v->[0]->tag !~ /hrStorageIndex/);
                
                $Type               = $v->[1]->val;
                $Descr              = $v->[2]->val;
                $AllocationUnits    = $v->[3]->val;
                $Size               = $v->[4]->val;
                $Used               = $v->[5]->val;

                $Freespace = (($Size - $Used) * $AllocationUnits);
                print STDERR "Found HOST MIB filesystem: Type=$Type, 
Descr=$Descr, AllocationUnits=$AllocationUnits, Size=$Size, Used=$Used\n" if 
$DEBUG;

                # This next check makes sure we're only looking at storage
                # devices of the "FixedDevice" type (4). For comparison, 
Physical 
                # RAM is 2, Virtual Memory is 3,  Floppy Disk is 6, and CD-ROM 
is 7
                # Using the Empire agent, this will eliminate drive types other
                # than hard disks. The UCD agent is not as good as determining
                # drive types under the HOST mib.
                next if ($Type !~ /\.1\.3\.6\.1\.2\.1\.25\.2\.1\.4/);
                
                if ($Size != 0) {
                    $Percent= ($Used / $Size) * 100.0;
                } else {
                    $Percent=0;
                };
                
                push (@disklist,[$Freespace,$Percent,$Descr, "N/A"]);
                print STDERR "Using HOST MIB filesystem:  $Descr ($Type)\n" if 
$DEBUG;
            };

            if (@disklist) {
                return @disklist;
            };
    };

    # Then we test the perfmib from M$ NT resource kit
    # I'm using the agent/mib-defs from
    # http://www.wtcs.org/snmp4tpc/
    
    # for somereason every second request fails,
    # so we fetch the variables twice and discards 
    # the bad ones

    if ($USEMIB =~ /perf/i)
    {

            $v = new SNMP::VarList (
                ['ldisklogicalDiskIndex'],
                ['ldiskPercentFreeSpace'],
                ['ldiskPercentFreeSpace'],
                ['ldiskFreeMegabytes'],
                ['ldiskFreeMegabytes'],
            );

            while (defined $s->getnext($v)) {
                # Make sure we are still in relevant portion of MIB
                last if ($v->[1]->val !~ /^\.1\.3\.6\.1\.2\.1\.25\.2\.1\.4/);
                last if ($v->[0]->val =~ /Total/);
                
                $Descr          = ( $v->[0]->val =~ /.*:.*:(\w+:)$/gi)[-1] ;
                $Percent        = $v->[2]->val;
                $Freespace      = $v->[4]->val * 1024 * 1024;
                
                push (@disklist,[$Freespace,$Percent,$Descr, "N/A"]);
                print STDERR "Using PERF MIB filesystem: $Descr, 
$Freespace,$Percent\n" if $DEBUG;
            };
            if (@disklist) {
                return @disklist;
            }
    }
    
    #Try UCD-SNMP .enterprises.ucdavis.dskTable.dskEntry MIB extrnsion
    # Comes with UCD-SNMP / net-snmp

    if ($USEMIB =~ /ucd/i)
    {
            $v = new SNMP::VarList (
                ['dskIndex'],
                ['dskPath'],
                ['dskPercent'],
                ['dskAvail'],
                ['dskDevice'],
                ['dskPercentNode'],
            );

            while (defined $s->getnext($v)) {
                last if ($v->[0]->tag !~ /dskIndex/);  # end of MIB?

                my $instancenum= $v->[0]->iid;  # what instance number?

                # check for partial fetches (like swap partition) that won't
                # return all the MIB entries

                if ($v->[2]->iid != $instancenum or
                    $v->[3]->iid != $instancenum or
                    $v->[5]->iid != $instancenum)
                {
                    # ignore this instance and try to move on to next
                    # we wouldn't need this if use-dummy-values really worked
                    $v = new SNMP::VarList (
                        ['dskIndex', $instancenum],
                        ['dskPath', $instancenum],
                        ['dskPercent', $instancenum],
                        ['dskAvail', $instancenum],
                        ['dskDevice', $instancenum],
                        ['dskPercentNode', $instancenum],
                    );
                    next;
                }

                $Descr              = $v->[1]->val;
                $Percent           = $v->[2]->val;
                $Freespace         = $v->[3]->val;
                $Freespace         *= 1024; #Convert from kbytes to bytes to 
make consistent
                $Type              = $v->[4]->val;
                $InodePercent      = $v->[5]->val;

                print STDERR "Found UCD MIB filesystem: Type=$Type, 
Descr=$Descr, Percent=$Percent, Freespace=$Freespace, 
InodePercent=$InodePercent\n" if $DEBUG;

                # Try to catch only local filesystems. This covers the
                # the basics, but probably should be configurable
                next unless ( $Type =~ m/\b(md|hd|wd|sd|ida|raid)/ ) ;

                print STDERR "Using UCD MIB filesystem:  $Descr ($Type)\n" if 
$DEBUG;
                push (@disklist,[$Freespace,$Percent,$Descr, $InodePercent]);
            };

            if (@disklist) {
                return @disklist;
            }
    }

    #Check for errors
    if ($s->{ErrorNum}) {
        push (@HOSTS, $host);
        push (@ERRS, "$host: could not get SNMP info: " . $s->{ErrorStr});
        $RET = ($RET == 1) ? 1 : 2 ;
        return undef;
    }

    # Check for OID not found
    push (@HOSTS, $host);
    push (@ERRS, "$host: Disk space OIDs not found in MIB(s): $USEMIB");
    $RET = ($RET == 1) ? 1 : 2 ;
    return undef;

}

# format specifications, should be able to cut, paste and edit into a config 
file
format STDOUT_TOP =
System           Description                    % Used    Free space     Inode%
-------------------------------------------------------------------------------
.

format STDOUT =
@<<<<<<<<<<<<<<  @<<<<<<<<<<<<<<<<<<<<<<<<<<<<  @###.# %  @#######.# mb @>>>>>>
$host,           $disk->[2],                    $disk->[1], 
$disk->[0]/1024/1024,  ( $disk->[3] ne "N/A" ? ($disk->[3] + 0) . "%" : "N/A")
.
#
# snmpdiskspace.cf- configuration file for snmpdiskspace.monitor
#
# format:
#
# host filesys free ifree
#
# The monitor script uses a "first match" algorithm. So put your more
# specific directives at top, and leave the more general directives
# for the bottom.
#
#
# host     Regex describing the name of the host(s). Remember to escape
#          dots if you're fully qualifying hostnames, e.g.,
#          some\.domain\.com, otherwise you might not be matching what
#          you think you're matching.
#
# filesys  Regex describing the filesystem to check, as represented 
#          in the relevant mib (after mangling by the monitor).
#          Remember to use regex syntax, and not file glob syntax.
#
# free     The amount of free space which will trigger a failure,
#          expressed as "10", "10kb", "10MB", or "10GB" for
#          bytes, kilobytes, megabytes or gigabytes. The format
#          "10%" signifies percent of the total disk space. 
#          "0" turns of checking for the filesystem/disk.
#
# ifree    Percentage of free inodes, below which will trigger a failure.
#          Expressed as "5%".  The host must support the UCD dskTable MIB.
#
#
#   BE SURE TO TEST your configuration with the "--listall" option!
#   This way, you will see exactly what filesystems are found by the script,
#   and what their alarm thresholds will be.
#
#          Examples:
#          *    *       5%
#          Give a warning when the free space goes below 5 %
#          (This is the default behavior of the monitor)
#          This should always be the last line in your config file
#          because it will match everything.
#
#          *    *       5%      10%
#          As above, but also warn if free inodes drops below 10%.
#
#          ior  *       15%
#          On the host ior the limit is 15%
#
#          poo  /       1gb
#          poo's root should have a full gig free
#
#          www[1-4]     *       500mb
#          any partition on the machines www1, www2, www3, and ww4
#          should have at least 500mb free.
#
#          *    /cdrom/.*       0
#          anything that is mounted on /cdrom will be full anyway
#          At least for Solaris, you need a regex like this bec.
#          vold mounts each new CD on a new partition, and you
#          won't know its name until you put it into the drive.
#
#
# Always ignore anything on cdrom partitions
*       /cdrom.*        0
*       /mnt    0
#
#
# This line always should be last because it matches everything.
*       *       5%
_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to