On Mon, Mar 20, 2000 at 10:29:28AM -0500, Seth Vidal wrote:
>[...]
> On the subject of semi-silent failures: Has anyone written a script to
> monitor the [UUUUU]'s in the /proc/mdstat location? It would be fairly
> trivial (start beeping the system speaker loudly and emailing
> repetitively) Has this already been or should I work on it?
Humm, good idea! I made a simple shell script a while ago to monitor
Lm-sensors stuff (fan speeds, power supplies, temperatures, etc...
see www.lm-sensors.nu). It wasn't too hard. In rc.local, I just
started it in the background and it would email me if things went
arey.
Here's what it looked like:
---SOF---
PATH="/bin:/usr/bin:/usr/local/bin:${PATH}"
ADMIN_EMAIL="root@localhost"
if [ -n "`sensors | grep ALARM`" ]
then
echo "Pending Alarms on start up! Exiting!"
exit
fi
while true
do
sleep 3
sensors | /root/bin/getsensors.pl | /root/bin/displayit.pl
if [ -n "`sensors | grep ALARM`" ]
then
sensors | mail -s "**** Hardware Health Warning ****" $ADMIN_EMAIL
sleep 600
fi
done
---EOF---
Btw- that line just after the 'sleep 3' is to display stuff on an LCD
panel in a 5.25" drive, so ignore that.
Notice that it checks every 3 seconds, but emails every 10 minutes
(prevents the inbox from filling up overnight).
What does it look like when a drive dies? I presume something like:
[..UD]
Then, perhaps just doing a (Perl) regexp: if (/\[[^\]]*D[^\]]*\]/)
then report the failure?
Phil
--
Philip Edelbrock -- IS Manager -- Edge Design, Corvallis, OR
[EMAIL PROTECTED] -- http://www.netroedge.com/~phil
PGP F16: 01 D2 FD 01 B5 46 F4 F0 3A 8B 9D 7E 14 7F FB 7A