On Mon, Mar 20, 2000 at 10:29:28AM -0500, Seth Vidal wrote:
>[...]
> On the subject of semi-silent failures: Has anyone written a script to
> monitor the [UUUUU]'s in the /proc/mdstat location? It would be fairly
> trivial (start beeping the system speaker loudly and emailing
> repetitively) Has this already been or should I work on it?
OK, as usual I've taken somebody else's idea and ran with it. Here's
a little script daemon which monitors the raids for failures:
---SOF---
#!/bin/bash
PATH="/bin:/usr/bin:/usr/local/bin:${PATH}"
ADMIN_EMAIL="root@localhost"
RAID_STAT="/proc/mdstat"
while true
do
sleep 3
if [ -n "`cat $RAID_STAT | perl -ne 'if (/(.*\[U*[^\]\[U]+U*\])$/) { print \"Failure!
$1\n\"; }'`" ]
then
cat $RAID_STAT | mail -s "**** Raid Failure Warning ****" $ADMIN_EMAIL
sleep 600
fi
done
---EOF---
(Watch for wrapped lines. There are a couple long ones.)
If you want to test it, then do this:
cat /proc/mdstat > testfile
then, change RAID_STAT to ="testfile"
Then, start the daemon and let it run. After a while, simulate the
failure by editing 'testfile'. If it works, kill the daemon, and
point RAID_STAT to the real file (/proc/mdstat).
To use it in rc.local (or where ever), just start it like this:
su nobody raid-monitor.sh &
(Or change 'nobody' to some other under priledged user of your
choice.)
Phil
--
Philip Edelbrock -- IS Manager -- Edge Design, Corvallis, OR
[EMAIL PROTECTED] -- http://www.netroedge.com/~phil
PGP F16: 01 D2 FD 01 B5 46 F4 F0 3A 8B 9D 7E 14 7F FB 7A