> -----Ursprüngliche Nachricht-----
> Von: "David Pinkerton H" <[EMAIL PROTECTED]>
> Gesendet: 30.10.08 22:55:22
> An: "'[email protected]'" <[email protected]>
> Betreff: [Linux-HA] Heartbeat Cluster Monitoring
Hi David,
> Ideally if a resource is stopped I would like the monitoring system to
> confirm it restarts on a different node, if not page out. I do not want to
> be paged if the cluster successfully fails over (ie. working as designed)
the most simple IMHO: Make a reference 'crm_mon -r -1' with the state of a
working cluster.
Then do a crm_mon regularly and compare the output to your reference output.
You can extract only these lines you're interested in (e.g. don't compare the
timestamps).
If something changed send alert.
I'm pretty sure you should also be notified if a ressource is transferred to
another node.
In most cases something went wrong which has to be investigated as early as
possible.
It's likely that your cluster isn't HA anymore.
But even in your special scenario:
OK_VALUE=`crm_mon -1 -r | grep -i started | wc -l`
max_tries=10
tries=0
while true
do
CURRENT=`crm_mon -1 -r | grep -i started | wc -l`
if [ $OK_VALUE -ne $CURRENT ]
then
tries=`expr $tries + 1`
if [ $tries -gt $max_tries ]
then
echo "WARNING: Look at the cluster" | mail -s 'SHIT: Something bad
happened to the cluster' [EMAIL PROTECTED]
fi
else
tries=0
fi
sleep 5
done
Warning: Not tested in a shell, it's email-ware. ;-)
Best regards
Andreas Mock
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems