>Number:         152859
>Category:       misc
>Synopsis:       [new port] net-mgmt/nagios-check_hdd_health , is a Nagios 
>plug-in written in shell to check your HDD health using SmartMonTools
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Mon Dec 06 12:10:08 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Marian Jamrich
>Release:        8.2 prerelease
>Organization:
>Environment:
>Description:
check_hdd_health is a Nagios plug-in written in shell to check your HDD health 
using SmartMonTools.
This script check HDD from S.M.A.R.T this values:

- Spin Retry Count
- Reallocated Sector Ct
- Reallocated Event Count
- Current Pending Sector
- Offline Uncorrectable
- Total health test


>How-To-Repeat:

>Fix:


Patch attached with submission follows:

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#       check_hdd_health
#
echo x - check_hdd_health
sed 's/^X//' >check_hdd_health << '53eb126359c9c0d8f2d23c32c84ef809'
X#!/bin/sh
X#
XPATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/usr/local/bin
X
XST_OK=0
XST_WR=1
XST_CR=2
XST_UN=3
X
Xsmartctl=$(which smartctl)
X
X## Smartmontools
XSMT=Smartmontools
X
X# Plugin name
XPROGNAME=`basename $0`
X            
X# Version
XVERSION="Version 1.0"
X        
X# Author
XAUTHOR="Marian Jamrich"
X
XTMPFILE=/tmp/smart.nagios.$$
X
X# Clean up when done or when aborting
Xtrap "rm -f ${TMPFILE}" 0 1 2 3 15
X
X#print_version() {
X#    echo "$PROGNAME $VERSION $1"
X#}
X
Xmini_help() {
X        echo "Usage $0 --device $device --without [src rsc rec cps ou]"
X}
X
Xprint_help() {
X    clear;
X    echo 
"*********************************************************************************"
X    echo "* $PROGNAME $VERSION $1""($AUTHOR) <[email protected]> (2010) 
*" 
X    echo 
"*********************************************************************************"
X    echo "This is Nagios plugin to check HDD health from S.M.A.R.T. by 
Smartmontools."
X    echo '
XThe S.M.A.R.T. attributes are specific properties (parameters) of various 
parts of a disk. 
XS.M.A.R.T. uses attributes to monitor the disk condition and to analyze its 
reliability.
X
XScript check HDD from S.M.A.R.T with the following properties (if your HDD 
supports it):
X
X** Spin Retry Count (src) **
XCount of retry of spin start attempts. This attribute stores a total count of 
the spin start attempts to reach the fully operational speed (under the 
Xcondition that the first attempt was unsuccessful). A decrease of this 
attribute value is a sign of problems in the hard disk mechanical subsystem.
X
X** Reallocated Sector Count (rsc) **
XCount of reallocated sectors. When the hard drive finds a 
read/write/verification error, it marks this sector as "reallocated" and 
transfers data to a 
Xspecial reserved area (spare area). This process is also known as remapping 
and "reallocated" sectors are called remaps. This is why, on a modern hard 
Xdisks, you can not see "bad blocks" while testing the surface - all bad blocks 
are hidden in reallocated sectors. 
X
X** Reallocated Event Count (rec) **
XCount of remap operations (transferring data from a bad sector to a special 
reserved disk area - spare area). The raw value of this attribute shows the 
Xtotal number of attempts to transfer data from reallocated sectors to a spare 
area. Unsuccessful attempts are counted as well as successful.
X
X** Current Pending Sector (cps) **
XCurrent count of unstable sectors (waiting for remapping). The raw value of 
this attribute indicates the total number of sectors waiting for remapping.
XLater, when some of these sectors are read successfully, the value is 
decreased. If errors still occur when reading some sector, the hard drive will 
try 
Xto restore the data, transfer it to the reserved disk area (spare area) and 
mark this sector as remapped. If this attribute value remains at zero, it 
Xindicates that the quality of the corresponding surface area is low.
X
X** Offline Uncorrectable (ou) **
XQuantity of uncorrectable errors. The raw value of this attribute indicates 
the total number of uncorrectable errors when reading/writing a sector. 
XA rise in the value of this attribute indicates that there are evident defects 
of the disk surface and/or there are problems in the hard disk drive 
Xmechanical subsystem.
X
X** Total health test (pass) **
XThis is test provided by Smartmontools. If total disk state is "health", 
Smartmontools marked as "PASSED".
X        '
X    echo "Nagios states:"
X    echo
X    echo "OK - if all values are \"0\"."
X    echo "Warning - if one or both values \"Spin Retry Count\" and 
\"Reallocated Event Count\" is between the values 1 to 9."
X    echo "Critical - if some value is greater than \"0\" except \"Spin Retry 
Count (>=10)\" and \"Reallocated Event Count (>=10)\"."
X    echo -e 
"\n---------------------------------------------------------------------"
X    echo "Usage:"
X    echo "$0 --device /dev/ad0 [ --without [src rsc rec cps ou]]"
X    echo 
"---------------------------------------------------------------------"
X    exit $ST_UN
X}
X
Xcase "$1" in
X        --help|-h|--usage|-u)
X            print_help                                              
X            exit $ST_UN
X            ;;
X        -d | --device)
X            device=$2
X            ;;
X        -V)
X            print_version
X            exit
X            ;;
X        *)
X            echo "Unknown argument: $1"
X            echo "For more information please try -h or --help!"
X            exit $ST_UN
X            ;;
Xesac
Xshift
X
Xtest -z $device && echo -e "\nYou forgot to define device! Please try \"-h or 
--help\" to help." && exit $ST_UN
Xtest `uname` != "FreeBSD" && echo "This plugin is only for FreeBSD." && exit 
$ST_UN
X
Xif [ ! -e $device ]; then
X        echo
X        echo "Unknown device \"$device\"!"
X        exit $ST_UK
Xfi
X
Xif [ -z $smartctl ]; then
X        echo -e "\nYou don't have installed $SMT. Please install it at 
http://smartmontools.sourceforge.net or pkg_add -r \"smartmontools\"..."
X        exit $ST_UN
Xfi
X
X$smartctl -a $device > ${TMPFILE}
XSMART_SUPPORT=`awk '/SMART support is/ {print $4}' ${TMPFILE} | tail -n 1`
X
Xif [ "${SMART_SUPPORT}" = "Unavailable" ]; then
X        echo -e "\nS.M.A.R.T support is Unavailable for $device !!! You should 
enable it \"smartctl -s on $device\"."
X        exit $ST_UN
Xelif [ "${SMART_SUPPORT}" != "Enabled" ]; then
X        echo -e "\nMaybe you don't have enabled S.M.A.R.T support in $SMT! 
Please type \"smartctl -s on $device\" that you have it turned on. Or device 
does not support S.M.A.R.T function."
X        exit $ST_UN
Xfi
X
X## start S.M.A.R.T test and set variables
Xsrc=`awk '/Spin_Retry_Count/ {print $10}' ${TMPFILE} `
Xrsc=`awk '/Reallocated_Sector_Ct/ {print $10}' ${TMPFILE} `
Xrec=`awk '/Reallocated_Event_Count/ {print $10}' ${TMPFILE} `
Xcps=`awk '/Current_Pending_Sector/ {print $10}' ${TMPFILE} `
Xou=`awk '/Offline_Uncorrectable/ {print $10}' ${TMPFILE} `
Xpass=`awk -F\: '/test result/ { if ( $2 == " PASSED")  print "PASSED"; else 
print "FAILED" }' ${TMPFILE} `
X
X## if one or more S.M.A.R.T function is not supported by your HDD, then you 
define --without variable and then value is set to "0"
Xargs=`getopt w:without: $*`
Xfor arg; do
X        case "$arg" in
X                src) src=0;;
X                rsc) rsc=0;;
X                rec) rec=0;;
X                cps) cps=0;;
X                ou) ou=0;;
X        esac
Xdone
X
X# test if your HDD support all parameters:
X[ -z "$src" ] && echo -e "***********\n** ERROR **\n***********\n${device} 
don't support Spin_Retry_Count. Please try \"--without src\"." && mini_help && 
exit $ST_UN
X[ -z "$rsc" ] && echo -e "***********\n** ERROR **\n***********\n${device} 
don't support Reallocated_Sector_Ct. Please try \"--without rsc\"." && 
mini_help && exit $ST_UN
X[ -z "$rec" ] && echo -e "***********\n** ERROR **\n***********\n${device} 
don't support Reallocated_Event_Count. Please try --without rec." && mini_help 
&& exit $ST_UN
X[ -z "$cps" ] && echo -e "***********\n** ERROR **\n***********\n${device} 
don't support Current_Pending_Sector. Please try --without cps." && mini_help 
&& exit $ST_UN
X[ -z "$ou" ]  && echo -e "***********\n** ERROR **\n***********\n${device} 
don't support Offline_Uncorrectable. Please try \"--without ou\"." && mini_help 
&& exit $ST_UN
X
Xperfdata="smart=src=$src; rsc=$rsc; rec=$rec; cps=$cps; ou=$ou; pass=$pass"
X
X##### finally run test, print result and set exit code #####
Xif [ $src -eq 0 ] && [ $rsc -eq 0 ] && [ $rec -eq 0 ] && [ $cps -eq 0 ] && [ 
$ou -eq 0 ] && [ "$pass" = "PASSED" ]; then
X        echo "OK - HDD S.M.A.R.T health: src=$src, rsc=$rsc, rec=$rec, 
cps=$cps, ou=$ou, HEALTH_STATUS=$pass for $device. |${perfdata}"
X        exit $ST_OK
Xelif [ $src -gt 1 -a $src -lt 10 ] && [ $rsc -gt 0 ] && [ $rec -gt 1 -a $rec 
-lt 10 ] && [ $cps -eq 0 ] && [ $ou -eq 0 ] && [ "$pass" = "PASSED" ]; then
X        echo "WARNING - HDD S.M.A.R.T health: src=$src, rsc=$rsc, rec=$rec, 
cps=$cps, ou=$ou, HEALTH_STATUS=$pass for $device. |${perfdata}"
X        exit $ST_WR
Xelse
X        echo "CRITICAL - HDD S.M.A.R.T health: src=$src, rsc=$rsc, rec=$rec, 
cps=$cps, ou=$ou, HEALT_STATUS=$pass for $device. |${perfdata}"
X        exit $ST_CR
Xfi
53eb126359c9c0d8f2d23c32c84ef809
exit



>Release-Note:
>Audit-Trail:
>Unformatted:
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "[email protected]"

Reply via email to