hi there
i'm running a hearbeat webcluster with mon on fedora core 3 kernel 2.6.9-1.667
- heartbeat 1.2.3
- mon 0.99.2-8
i want mon to restart a failed service before mon restarts the hole heartbeat service.
my mon.cf looks as follows:
# # global options # cfbasedir = /etc/mon pidfile = /var/run/mon.pid statedir = /var/run/mon/state.d logdir = /var/run/mon/log.d dtlogfile = /var/run/mon/log.d/downtime.log alertdir = /usr/lib/mon/alert.d mondir = /usr/lib/mon/mon.d mondir = /usr/lib/mon/mon.d maxprocs = 20 histlength = 100 randstart = 60s
# # authentication types: # getpwnam standard Unix passwd, NOT for shadow passwords # shadow Unix shadow passwords (not implemented) # userfile "mon" user file # authtype = getpwnam
# # downtime logging, uncomment to enable # if the server is running, don't forget to send a reset command # when you change this # dtlogging = yes
# # NB: hostgroup and watch entries are terminated with a blank line (or # end of file). Don't forget the blank lines between them or you lose. # # Hostgroup ist in diesem Fall die Cluster IP
hostgroup phoebe 192.108.234.168
watch phoebe
service ping
interval 10s
monitor fping.monitor -r 4 -t 6000
allow_empty_group
period wd {Sun-Sat}
alert bring-ha-down.alert -S "No link to master node!!" [EMAIL PROTECTED]
# alert winpopup.alert antila
alertevery 5m
upalert mail.alert -S "Link is back up!" [EMAIL PROTECTED]
# upalert winpopup.alert pc1 pc2 pc3
upalertafter 5m
service httpd
depend phoebe:ping
monitor watch_process.monitor /usr/sbin/httpd ;;
allow_empty_group
interval 10s
period RESTART:
alert mail.alert -S "Service httpd NOT running!! Trying to restart..." [EMAIL PROTECTED]
alert httpd_restart.alert
# alert winpopup.alert antila
upalert mail.alert -S "Webserver is back up!" [EMAIL PROTECTED]
# upalert winpopup.alert antila
upalertafter 1m
period RESTART_FAILED:
alert bring-ha-down.alert -S "Restart httpd failed! Shutting down heartbeat for takeover..." [EMAIL PROTECTED]
# alert winpopup.alert antila
service tomcat
depend phoebe:httpd
monitor watch_process.monitor /usr/java/j2sdk1.4.2_07/bin/java ;;
allow_empty_group
interval 10s
period RESTART:
alert mail.alert -S "Service tomcat NOT running!! Trying to restart..." [EMAIL PROTECTED]
alert tomcat_restart.alert
# alert winpopup.alert pc1 pc2 pc3
upalert mail.alert -S "Tomcat service is back up!" [EMAIL PROTECTED]
# upalert winpopup.alert pc1 pc2 pc3
upalertafter 1m
period RESTART_FAILED:
alert bring-ha-down.alert -S "Restart tomcat failed! Shutting down heartbeat for takeover..." [EMAIL PROTECTED]
# alert winpopup.alert pc1 pc2 pc3
if i start heartbeat, all services get up without any problems. now i want to test mon, if it tries to restart the httpd-service, if i stop it. mon sees, that the service isnt running anymore, but it automatically calls the bring-ha-down.alert script in RESTART_FAILED period instead of the restart-httpd script in the RESTART period.
if i comment out the RESTART_FAILED entries, it works with restarting the service.
the funny thing is, this configuration worked the first time i used it, but not the next few times. and i got the examples from a linux-magazine, which should work.
whats the problem for mon? can somebody help?
_________________________________________________________________
Umfangreich, essenziell und aktuell! Auf MSN Search werden Sie schnell fündig! http://search.msn.ch/
_______________________________________________ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon