hi there

i'm running a hearbeat webcluster with mon on fedora core 3 kernel 2.6.9-1.667
- heartbeat 1.2.3
- mon 0.99.2-8


i want mon to restart a failed service before mon restarts the hole heartbeat service.
my mon.cf looks as follows:


#
# global options
#
cfbasedir    = /etc/mon
pidfile      = /var/run/mon.pid
statedir     = /var/run/mon/state.d
logdir       = /var/run/mon/log.d
dtlogfile    = /var/run/mon/log.d/downtime.log
alertdir     = /usr/lib/mon/alert.d
mondir       = /usr/lib/mon/mon.d
mondir       = /usr/lib/mon/mon.d
maxprocs     = 20
histlength   = 100
randstart    = 60s

#
# authentication types:
#   getpwnam      standard Unix passwd, NOT for shadow passwords
#   shadow        Unix shadow passwords (not implemented)
#   userfile      "mon" user file
#
authtype = getpwnam

#
# downtime logging, uncomment to enable
# if the server is running, don't forget to send a reset command
# when you change this
#
dtlogging = yes

#
# NB:  hostgroup and watch entries are terminated with a blank line (or
# end of file).  Don't forget the blank lines between them or you lose.
#
# Hostgroup ist in diesem Fall die Cluster IP

hostgroup phoebe 192.108.234.168

watch phoebe
service ping
interval 10s
monitor fping.monitor -r 4 -t 6000
allow_empty_group
period wd {Sun-Sat}
alert bring-ha-down.alert -S "No link to master node!!" [EMAIL PROTECTED]
# alert winpopup.alert antila
alertevery 5m
upalert mail.alert -S "Link is back up!" [EMAIL PROTECTED]
# upalert winpopup.alert pc1 pc2 pc3
upalertafter 5m
service httpd
depend phoebe:ping
monitor watch_process.monitor /usr/sbin/httpd ;;
allow_empty_group
interval 10s
period RESTART:
alert mail.alert -S "Service httpd NOT running!! Trying to restart..." [EMAIL PROTECTED]
alert httpd_restart.alert
# alert winpopup.alert antila
upalert mail.alert -S "Webserver is back up!" [EMAIL PROTECTED]
# upalert winpopup.alert antila
upalertafter 1m
period RESTART_FAILED:
alert bring-ha-down.alert -S "Restart httpd failed! Shutting down heartbeat for takeover..." [EMAIL PROTECTED]
# alert winpopup.alert antila
service tomcat
depend phoebe:httpd
monitor watch_process.monitor /usr/java/j2sdk1.4.2_07/bin/java ;;
allow_empty_group
interval 10s
period RESTART:
alert mail.alert -S "Service tomcat NOT running!! Trying to restart..." [EMAIL PROTECTED]
alert tomcat_restart.alert
# alert winpopup.alert pc1 pc2 pc3
upalert mail.alert -S "Tomcat service is back up!" [EMAIL PROTECTED]
# upalert winpopup.alert pc1 pc2 pc3
upalertafter 1m
period RESTART_FAILED:
alert bring-ha-down.alert -S "Restart tomcat failed! Shutting down heartbeat for takeover..." [EMAIL PROTECTED]
# alert winpopup.alert pc1 pc2 pc3



if i start heartbeat, all services get up without any problems. now i want to test mon, if it tries to restart the httpd-service, if i stop it. mon sees, that the service isnt running anymore, but it automatically calls the bring-ha-down.alert script in RESTART_FAILED period instead of the restart-httpd script in the RESTART period.
if i comment out the RESTART_FAILED entries, it works with restarting the service.
the funny thing is, this configuration worked the first time i used it, but not the next few times. and i got the examples from a linux-magazine, which should work.


whats the problem for mon? can somebody help?

_________________________________________________________________
Umfangreich, essenziell und aktuell! Auf MSN Search werden Sie schnell fündig! http://search.msn.ch/


_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to