On Mon, Mar 20, 2000 at 10:29:28AM -0500, Seth Vidal wrote:
>[...] 
> On the subject of semi-silent failures: Has anyone written a script to
> monitor the [UUUUU]'s in the /proc/mdstat location? It would be fairly
> trivial (start beeping the system speaker loudly and emailing
> repetitively) Has this already been or should I work on it?

Humm, good idea!  I made a simple shell script a while ago to monitor
Lm-sensors stuff (fan speeds, power supplies, temperatures, etc... 
see www.lm-sensors.nu).  It wasn't too hard.  In rc.local, I just
started it in the background and it would email me if things went
arey. 

Here's what it looked like:

---SOF---
PATH="/bin:/usr/bin:/usr/local/bin:${PATH}"

ADMIN_EMAIL="root@localhost"

if [ -n "`sensors | grep ALARM`" ]
then
        echo "Pending Alarms on start up!  Exiting!"
        exit
fi

while true
do
 sleep 3
 sensors | /root/bin/getsensors.pl | /root/bin/displayit.pl
 if [ -n "`sensors | grep ALARM`" ]
 then
        sensors | mail -s "**** Hardware Health Warning ****" $ADMIN_EMAIL
        sleep 600
 fi
done
---EOF---

Btw- that line just after the 'sleep 3' is to display stuff on an LCD
panel in a 5.25" drive, so ignore that.

Notice that it checks every 3 seconds, but emails every 10 minutes
(prevents the inbox from filling up overnight).

What does it look like when a drive dies?  I presume something like:

[..UD]

Then, perhaps just doing a (Perl) regexp: if (/\[[^\]]*D[^\]]*\]/)
then report the failure?


Phil

-- 
Philip Edelbrock -- IS Manager -- Edge Design, Corvallis, OR
   [EMAIL PROTECTED] -- http://www.netroedge.com/~phil
 PGP F16: 01 D2 FD 01 B5 46 F4 F0  3A 8B 9D 7E 14 7F FB 7A

Reply via email to