Benjamin Lawetz wrote:


Since I can't do that, what I've settled on is heartbeat + mon. Heartbeat will monitor for a system level failure and switch to the backup
machine if neccesary; and mon will watch the asterisk (or any
other) service and restart it and/or alert me if it fails.

What kind of monitor are you using to monitor asterisk?


Sorry for my slow response. My asterisk monitor right now is embarrassingly simple. All it does is execute show uptime and look for output starting with "System", see below. Obviously the method has limitations. 1) It will only really only tell me that the daemon is running, not that it's able to carry any calls. 2) It only works on localhost.

Input on how to test a remote instance of asterisk would be welcome, as well as a method of making a test call or reliably testing for the ability to make calls. My impression is that this would require asterisk to have a "Dial" command in the CLI, or a linux SIP client that I could execute from the shell. I'm not aware of the existence of either.

Any other simple and reliable methods of testing asterisk's condition would be welcome.

The alerts, by the way are pretty simple as well. See the excerpt from mon.cf below. restartasterisk.alert does exactly what it says. stopeverything.alert shuts down heartbeat, which will cause another node in the cluster to take over...in fact that node will start mon, which will then use the restartasterisk.alert to start up asterisk. Asterisk only starts on the backup machine when the primary fails so that config changes replicated from the primary will take effect. Total downtime should be < 3min. Which will let me hit 5-nine if it only happens once a year ;)

Config changes are replicated via rsync and ssh every few minutes. Voicemails are also copied from primary to backup by rsync. One thing I still need to do is make rsync stop attempting to replicate files when the failover occurrs. That will probably just require another alert below the "stopeverything.alert".

The replication of couse means that this setup will not protect me from a bad config change that breaks asterisk, as that change will be replicated throughout the cluster. So all significant config changes should be tested on a standalone box.


[EMAIL PROTECTED] mon]# cat /usr/lib/mon/mon.d/asterisk.monitor
#!/bin/sh
##can only check localhost.  Always checks localhost regardless of input

       SHOW_UPTIME=`/usr/sbin/asterisk -rx "show uptime" | /bin/cut -b 1-6`
       if [ $SHOW_UPTIME == "System" ]; then
               exit 0
       else
               echo "localhost"
               exit 1
       fi


From mon.cf:

watch asterisk
       service asterisk
               description asterisk pbx on localhost
               interval 10s
               monitor asterisk.monitor
               period wd {Sun-Sat}
                       alert mail.alert [EMAIL PROTECTED]
                       alert restartasterisk.alert [EMAIL PROTECTED]
                       alertevery 30s
       service asterisk-failover
               description checking if we need to stop heartbeat
               interval 10s
               monitor asterisk.monitor
               period wd {Sun-Sat}
                       alert stopeverything.alert [EMAIL PROTECTED]
                       alertafter 5 3m

_______________________________________________
--Bandwidth and Colocation sponsored by Easynews.com --

Asterisk-Users mailing list
[email protected]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-users

Reply via email to