Le Wed, 29 Aug 2007 20:57:37 +0200, Jovan Kostovski <[EMAIL PROTECTED]> a écrit:
> On 8/28/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> Ok, and if i want to switch between the 2 nodes ? How can i do ? >> "When problems appear on active node it will ask the standby to take over the >> resources, and the failover is made." >> >> How to configure monit to do that ? on my config, monit can restart a stopped >> service, but if the service failed to start, the service is declared as >> "timed >> out" and....nothing, the node is always primary...do you follow me ? > > Hi VIanney, > > Sorry for the late reply but I've been busy :( > > What monit can do is start, stop, restart and if the service fails to > start several > times its marked as timedout. That's were heartbeat steps in. When > some service can not start heartbeat will detect that the service > can't be started and will ask > the other node to take over the resources. > > Here is a good example of configuring monit + heartbeat: > http://linux.die.net/man/1/monit > It will give better preview of the setup that I'm talking about. > > You just need to change the monitrc file to look like the following > (monit+heartbeat+drbd = monitoring the mounted filesystem + mysql +apache): > > check process postfix with pidfile /var/spool/postfix/pid/master.pid > > start program = "/etc/init.d/postfix start" > > stop program = "/etc/init.d/postfix stop" > > mode active > > group local > > > > check process heartbeat with pidfile /var/run/heartbeat.pid > > start program = "/etc/init.d/heartbeat start" > > stop program = "/etc/init.d/heartbeat stop" > > mode active > > group local > > > > check device fs with path /dev/drbd0 > > start program = "/etc/ha.d/resource.d/ha-fs start" > > stop program = "/etc/ha.d/resource.d/ha-fs stop" > > if failed permission 660 then unmonitor > > if failed uid root then unmonitor > > if failed gid root then unmonitor > > if space usage > 80% then alert > > if space usage > 99% then stop > > mode manual > > group cluster > > > > check process mysql with pidfile /var/lib/mysql/mysqld.pid > > start program = "/etc/ha.d/resource.d/ha-mysql start" > > stop program = "/etc/ha.d/resource.d/ha-mysql stop" > > if failed host localhost port 3306 then restart > > if 5 restarts within 5 cycles then timeout > > mode manual > > group cluster > > depends on fs > > > > check process apache with pidfile /var/run/httpd2.pid > > start program = "/etc/ha.d/resource.d/ha-apache start" > > stop program = "/etc/ha.d/resource.d/ha-apache stop" > > if failed host myhost.com port 80 > > protocol HTTP request "/monit/token" then restart > > if cpu is greater than 60% for 2 cycles then alert > > if cpu > 80% for 5 cycles then restart > > if children > 250 then restart > > if loadavg(5min) greater than 10 for 8 cycles then stop > > if 5 restarts within 5 cycles then timeout > > mode manual > > group cluster > > depends on mysql > > ---------------------------------------------------------------------- > > There are two groups: > local (postfix + hearbeat) and > cluster (drbd + mysql + apache) > > All the services that are montored by heartbeat (the wrapper shell > scripts) should be added in /etc/ha.d/resource.d. What ever you put > and start from that location > will be monitored by heartbeat, so whenever one of the services from > the cluster group will fail to start for several times, heartbeat will > take over and will exec failover. > > You should specify the hostaname and the IP address in the > /etc/ha.d/haresources as well. > > For more info on configuring heartbeat check http://www.linux-ha.org/ > > I hope the thins are much more clear now ;) > > BR, Jovan > > > -- > To unsubscribe: > http://lists.nongnu.org/mailman/listinfo/monit-general > > I configured 2 groups and monit declares each failed-service as timedout, but it seems that heartbeat doesn't do anything if a service has failed. This is my ha.resources: ==File HA.resources==> Inet-Primaire 10.0.254.254 IPaddr::10.0.254.1 IPaddr::10.0.254.2 drbddisk::data Filesystem::/dev/drbd0::/data::ext3 MailTo::[EMAIL PROTECTED]::InetCluster monit-Inet-Primaire <== ==File monitrc==> ############################################################################## ##Monit control file ############################################################################### ## ## Comments begin with a '#' and extend through the end of the line. Keywords ## are case insensitive. All path's MUST BE FULLY QUALIFIED, starting with '/'. ## ## Bellow is the example of some frequently used statements. For information ## about the control file, a complete list of statements and options please ## have a look in the monit manual. ## ## ############################################################################### ## Global section ############################################################################### ## ## Start monit in background (run as daemon) and check the services at 2-minute ## intervals. # set daemon 15 # # ## Set syslog logging with the 'daemon' facility. If the FACILITY option is ## omited, monit will use 'user' facility by default. You can specify the ## path to the file for monit native logging. # set logfile syslog facility log_daemon # # ## Set list of mailservers for alert delivery. Multiple servers may be ## specified using comma separator. By default monit uses port 25 - it is ## possible to override it with the PORT option. # set mailserver localhost # primary mailserver # backup.bar.baz port 10025, # backup mailserver on port 10025 # localhost # fallback relay # # ## By default monit will drop the event alert, in the case that there is no ## mailserver available. In the case that you want to keep the events for ## later delivery retry, you can use the EVENTQUEUE statement. The base ## directory where undelivered events will be stored is specified by the ## BASEDIR option. You can limit the maximal queue size using the SLOTS ## option (if omited then the queue is limited just by the backend filesystem). # set eventqueue basedir /var/monit # set the base directory where events will be stored slots 200 # optionaly limit the queue size # # ## Monit by default uses the following alert mail format: ## ## --8<-- ## From: [EMAIL PROTECTED] # sender ## Subject: monit alert -- $EVENT $SERVICE # subject ## ## $EVENT Service $SERVICE # ## # ## Date: $DATE # ## Action: $ACTION # ## Host: $HOST # body ## Description: $DESCRIPTION # ## # ## --8<-- ## ## You can override the alert message format or its parts such as subject ## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc. ## are expanded on runtime. For example to override the sender: # set mail-format { from: [EMAIL PROTECTED] } # # ## You can set the alert recipients here, which will receive the alert for ## each service. The event alerts may be restricted using the list. # set alert [EMAIL PROTECTED] # receive all alerts # set alert [EMAIL PROTECTED] only on { timeout } # receive just service- # # timeout alert # # ## Monit has an embedded webserver, which can be used to view the ## configuration, actual services parameters or manage the services using the ## web interface. # set httpd port 3001 and SSL ENABLE PEMFILE /etc/ssl/CA/private/InetAdministration-key-cert.pem allow admin:pladppiuc### # use address localhost # only accept connection from localhost # allow localhost # allow localhost to connect to the server and # allow admin:monit # require user 'admin' with password 'monit' # # ############################################################################### ## Services ############################################################################### ## ## Check the general system resources such as load average, cpu and memory ## usage. Each rule specifies the tested resource, the limit and the action ## which will be performed in the case that the test failed. # # check system myhost.mydomain.tld # if loadavg (1min) > 4 then alert # if loadavg (5min) > 2 then alert # if memory usage > 75% then alert # if cpu usage (user) > 70% then alert # if cpu usage (system) > 30% then alert # if cpu usage (wait) > 20% then alert # # ## Check a file for existence, checksum, permissions, uid and gid. In addition ## to the recipients in the global section, customized alert will be send to ## the additional recipient. The service may be grouped using the GROUP option. # # check file apache_bin with path /usr/local/apache/bin/httpd # if failed checksum and # expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor # if failed permission 755 then unmonitor # if failed uid root then unmonitor # if failed gid root then unmonitor # alert [EMAIL PROTECTED] on { # checksum, permission, uid, gid, unmonitor # } with the mail-format { subject: Alarm! } # group server # # ## Check that a process is running, responding on the HTTP and HTTPS request, ## check its resource usage such as cpu and memory, number of childrens. ## In the case that the process is not running, monit will restart it by ## default. In the case that the service was restarted very often and the ## problem remains, it is possible to disable the monitoring using the ## TIMEOUT statement. The service depends on another service (apache_bin) which ## is defined in the monit control file as well. # # check process apache with pidfile /usr/local/apache/logs/httpd.pid # start program = "/etc/init.d/httpd start" # stop program = "/etc/init.d/httpd stop" # if cpu > 60% for 2 cycles then alert # if cpu > 80% for 5 cycles then restart # if totalmem > 200.0 MB for 5 cycles then restart # if children > 250 then restart # if loadavg(5min) greater than 10 for 8 cycles then stop # if failed host www.tildeslash.com port 80 protocol http # and request "/monit/doc/next.php" # then restart # if failed port 443 type tcpssl protocol http # with timeout 15 seconds # then restart # if 3 restarts within 5 cycles then timeout # depends on apache_bin # group server # # ## Check the device permissions, uid, gid, space and inode usage. Other ## services such as databases may depend on this resource and automatical ## graceful stop may be cascaded to them before the filesystem will become ## full and the data will be lost. # # check device datafs with path /dev/sdb1 # start program = "/bin/mount /data" # stop program = "/bin/umount /data" # if failed permission 660 then unmonitor # if failed uid root then unmonitor # if failed gid disk then unmonitor # if space usage > 80% for 5 times within 15 cycles then alert # if space usage > 99% then stop # if inode usage > 30000 then alert # if inode usage > 99% then stop # group server # # ## Check a file's timestamp: when it becomes older then 15 minutes, the ## file is not updated and something is wrong. In the case that the size ## of the file exceeded given limit, perform the script. # # check file database with path /data/mydatabase.db # if failed permission 700 then alert # if failed uid data then alert # if failed gid data then alert # if timestamp > 15 minutes then alert # if size > 100 MB then exec "/my/cleanup/script" # # ## Check the directory permission, uid and gid. An event is triggered ## if the directory does not belong to the user with the uid 0 and ## the gid 0. In the addition the permissions have to match the octal ## description of 755 (see chmod(1)). # # check directory bin with path /bin # if failed permission 755 then unmonitor # if failed uid 0 then unmonitor # if failed gid 0 then unmonitor # # ## Check the remote host network services availability and the response ## content. One of three pings, a successfull connection to a port and ## application level network check is performed. # # check host myserver with address 192.168.1.1 # if failed icmp type echo count 3 with timeout 3 seconds then alert # if failed port 3306 protocol mysql with timeout 15 seconds then alert # if failed url # http://user:[EMAIL PROTECTED]:8080/?querystring # and content == 'action="j_security_check"' # then alert check process vsftpd_ftpserver with pidfile /var/run/vsftpd/vsftpd.pid start program = "/etc/init.d/vsftpd start" stop program = "/etc/init.d/vsftpd stop" if failed port 21 protocol ftp then restart if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/etc/init.d/heartbeat stop" group Inet-Primaire mode manual check process sshd_remote_access_server with pidfile /var/run/sshd.pid start program "/etc/init.d/ssh start" stop program "/etc/init.d/ssh stop" if failed port 2145 protocol ssh then restart if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group local mode active check process mysql_DBserver with pidfile /var/run/mysqld/mysqld.pid start program = "/etc/init.d/mysql start" stop program = "/etc/init.d/mysql stop" if failed host 127.0.0.1 port 3306 then restart if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group Inet-Primaire mode manual check process apache2_webserver with pidfile /var/run/apache2.pid start program = "/etc/init.d/apache2 start" stop program = "/etc/init.d/apache2 stop" if failed host 127.0.0.1 port 80 protocol http then restart # and request "/monit/token" then restart if cpu is greater than 60% for 2 cycles then alert if cpu > 80% for 2 cycles then restart if totalmem > 500 MB for 5 cycles then restart if children > 250 then restart if loadavg(5min) greater than 10 for 8 cycles then stop if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group Inet-Primaire mode manual check process postfix_mailserver with pidfile /var/spool/postfix/pid/master.pid start program = "/etc/init.d/postfix start" stop program = "/etc/init.d/postfix stop" if failed port 25 protocol smtp then restart if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group local mode active check process ntop_network_monitoring with pidfile /var/run/ntop.pid start program = "/etc/init.d/ntop start" stop program = "/etc/init.d/ntop stop" if failed port 3000 type tcpssl then restart if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group Inet-Primaire mode manual check process freeradius_auth_server with pidfile /var/run/freeradius/freeradius.pid start program = "/etc/init.d/freeradius start" stop program = "/etc/init.d/freeradius stop" if failed port 1812 type udp then restart if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group Inet-Primaire mode manual check process dhcpd_server with pidfile /var/run/dhcpd.pid start program = "/etc/init.d/dhcp3-server start" stop program = "/etc/init.d/dhcp3-server stop" if failed port 67 type udp then restart if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group Inet-Primaire mode manual check process bind_dns_server with pidfile /var/run/bind/run/named.pid start program = "/etc/init.d/bind9 start" stop program = "/etc/init.d/bind9 stop" if failed port 53 type tcp then restart if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group local mode active check process upsd_information with pidfile /var/run/nut/upsd.pid start program = "/etc/init.d/ups-monitor start" stop program = "/etc/init.d/ups-monitor stop" if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group local mode active check process upsmon_control with pidfile /var/run/nut/upsmon.pid start program = "/etc/init.d/ups-monitor start" stop program = "/etc/init.d/ups-monitor stop" if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group local mode active check process ups_driver with pidfile /var/run/nut/usbhid-ups-MGE850VA.pid start program = "/etc/init.d/ups-monitor start" stop program = "/etc/init.d/ups-monitor stop" if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group local mode active check process eserver_emule_server with pidfile /var/run/eserver.pid start program = "/etc/init.d/eserver start" stop program = "/etc/init.d/eserver stop" if failed port 4661 type tcp then restart if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group Inet-Primaire mode manual check process teamspeak_server with pidfile /home/teamspeak/tsserver2.pid start program = "/etc/init.d/teamspeak start" stop program = "/etc/init.d/teamspeak stop" if failed port 8767 type udp then restart if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group Inet-Primaire mode manual check process heartbeat with pidfile /var/run/heartbeat.pid start program = "/etc/init.d/heartbeat start" stop program = "/etc/init.d/heartbeat stop" if 2 restarts within 2 cycles then timeout #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop" group local mode active ############################################################################### ## ## It is possible to include the configuration or its parts from other files or ## directories. # # include /etc/monit.d/* # # <==== I don't know why heartbeat doesnt do anything in case of service failure, have i missed something? Thanks, Vianney -- To unsubscribe: http://lists.nongnu.org/mailman/listinfo/monit-general
