On Mon, Mar 20, 2000 at 10:29:28AM -0500, Seth Vidal wrote:
>[...] 
> On the subject of semi-silent failures: Has anyone written a script to
> monitor the [UUUUU]'s in the /proc/mdstat location? It would be fairly
> trivial (start beeping the system speaker loudly and emailing
> repetitively) Has this already been or should I work on it?

OK, as usual I've taken somebody else's idea and ran with it.  Here's
a little script daemon which monitors the raids for failures:

---SOF---
#!/bin/bash

PATH="/bin:/usr/bin:/usr/local/bin:${PATH}"

ADMIN_EMAIL="root@localhost"

RAID_STAT="/proc/mdstat"

while true
do
 sleep 3
 if [ -n "`cat $RAID_STAT | perl -ne 'if (/(.*\[U*[^\]\[U]+U*\])$/) { print \"Failure! 
$1\n\"; }'`" ]
 then
        cat $RAID_STAT | mail -s "**** Raid Failure Warning ****" $ADMIN_EMAIL
        sleep 600
 fi
done
---EOF---

(Watch for wrapped lines.  There are a couple long ones.)

If you want to test it, then do this:

cat /proc/mdstat > testfile

then, change RAID_STAT to ="testfile"

Then, start the daemon and let it run.  After a while, simulate the
failure by editing 'testfile'.  If it works, kill the daemon, and
point RAID_STAT to the real file (/proc/mdstat).

To use it in rc.local (or where ever), just start it like this:

su nobody raid-monitor.sh &

(Or change 'nobody' to some other under priledged user of your
choice.)


Phil

-- 
Philip Edelbrock -- IS Manager -- Edge Design, Corvallis, OR
   [EMAIL PROTECTED] -- http://www.netroedge.com/~phil
 PGP F16: 01 D2 FD 01 B5 46 F4 F0  3A 8B 9D 7E 14 7F FB 7A

Reply via email to