Re: [Linux-HA] Heartbeat Cluster Monitoring

Andreas Mock Thu, 30 Oct 2008 16:23:03 -0700

> -----Ursprüngliche Nachricht-----
> Von: "David Pinkerton H" <[EMAIL PROTECTED]>
> Gesendet: 30.10.08 22:55:22
> An: "'[email protected]'" <[email protected]>
> Betreff: [Linux-HA] Heartbeat Cluster Monitoring


Hi David,


> Ideally if a resource is stopped I would like the monitoring system to 
> confirm it restarts on a different node, if not page out.  I do not want to 
> be paged if the cluster successfully fails over (ie. working as designed)

the most simple IMHO: Make a reference 'crm_mon -r -1' with the state of a 
working cluster.
Then do a crm_mon regularly and compare the output to your reference output.
You can extract only these lines you're interested in (e.g. don't compare the 
timestamps).
If something changed send alert.

I'm pretty sure you should also be notified if a ressource is transferred to 
another node.
In most cases something went wrong which has to be investigated as early as 
possible.
It's likely that your cluster isn't HA anymore.

But even in your special scenario:

OK_VALUE=`crm_mon -1 -r | grep -i started | wc -l`
max_tries=10
tries=0
while true
do
   CURRENT=`crm_mon -1 -r | grep -i started | wc -l`
   if [ $OK_VALUE -ne $CURRENT ]
   then
      tries=`expr $tries + 1`
     if [ $tries -gt $max_tries ]
     then
         echo "WARNING: Look at the cluster" | mail -s 'SHIT: Something bad 
happened to the cluster' [EMAIL PROTECTED]
     fi
   else
      tries=0
   fi
sleep 5
done 


Warning: Not tested in a shell, it's email-ware.  ;-)

Best regards
Andreas Mock

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat Cluster Monitoring

Reply via email to