Just as a followup, I took a look at the output of the clustat -x, and one of the values is "last transition". I wrote a check that looks at a given service and then calculates the difference between the current time and the last transition. If that time is lower than a given threshold, it alarms. it is kind of a hack, but will do until I can get a scripts and log parsing checks to have a little more proactive approach.
B On Fri, Feb 20, 2009 at 7:17 PM, Burton Simonds <[email protected]> wrote: > I was actually looking in Google for something like that earlier > today. That would work, but still has the issue of tracking the > previous state. From what I have read about the clustered services > checks, is that it will see if the service is running somewhere, but > will not notify if the service has changed state. I am running NRPE > on the clustered hosts and using that to check the processes on each > of the hosts. > > I am looking at setting up the cluster-snmp stuff, and I will see if > that will provide me with the information I need. Otherwise, I might > just go with log scraping. > > B > On Fri, Feb 20, 2009 at 5:55 PM, eric rosel <[email protected]> > wrote: >> Hi List, >> >> I've been toying with the idea of writing an init script resource which will >> send an alert to <type your favorite network/host monitoring system here> >> everytime it gets called with a "start" or "stop" argument. >> >> Another way is to make it send "alive" messages everytime it's called with >> "status", and then configure your monitoring app to sound the sirens when it >> stops getting those messages, or if the source of those messages changes. >> >> One then simply has to include this script resource with a clustered service. >> >> -eric >> >> >> --- On Sat, 2/21/09, Martin Fuerstenau <[email protected]> wrote: >> >>> From: Martin Fuerstenau <[email protected]> >>> Subject: Re: [Linux-cluster] Monitoring Failovers >>> To: "linux clustering" <[email protected]> >>> Date: Saturday, February 21, 2009, 12:41 AM >>> It is a little bit hard to do. It is on my todo list too. >>> The problem is >>> to determine the old state. So for example if you switch an >>> ip address >>> and you have a service bound to that address you have >>> nearly no chance >>> to monitor it from the Nagios side. >>> >>> I have tested using the MAC address and arp but this is >>> awesome if you >>> have bonding. Because if the MAC switches it may be the >>> bonding of the >>> cluster or the cluster switched. But hardcoded MAC >>> addresses in the >>> monitor script will not be good idea. >>> >>> Too much trouble in maintenance. >>> >>> If anyone has a good idea I will write the plugin and post >>> it >>> Nagiosexchange. >>> >>> Martin Fuerstenau >>> >>> On Fri, 2009-02-20 at 11:04 -0500, Burton Simonds wrote: >>> > I am in the process of setting up Nagios for system >>> monitoring, and I >>> > would like to have a way to know if a failover has >>> occurred. If >>> > everything works as it should, there be a minimal >>> impact on the >>> > services. Right now it looks like my best bet is >>> basically scrape the >>> > logs and look for the failover messages there and >>> trigger an alarm. >>> > >>> > I was wondering if anyone else has done anything. I >>> found in an >>> > archive a check_rhcs script that I am going to employ >>> (which looks >>> > pretty cool), but that just looks at the status of the >>> services. I >>> > want to either compare the current status to the >>> previous status or >>> > have something monitoring the cluster an pushes the >>> alert to Nagios. >>> > >>> > Thanks, >>> > B >> >> >> >> >> >> -- >> Linux-cluster mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > -- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
