If I get some time, I'll look. This almost sounds like because apache is
not being monitored, it somehow falls "out of dependency" and then ospfd
thinks that it doesn't have one.
On 12/6/11 2:39 PM, drich wrote:
Changing the cycle time changes the frequency but not what happens. I
still see essentially the following:
* apache stops and monit detects it
* monit attempts a restart
* monit stops ospfd
* mont starts apache
* monit unmonitors ospfd
* ... 30 seconds later ...
* apache fails to start
* monit starts ospfd
* ... repeat the above cycle ...
* after 2 cycles, it triggers my "2 restarts" rule and stops ospfd
I don't think it should be starting ospfd at all since the dependent
service is failing to restart.
On 06.12.2011 10:34, Rory Toma wrote:
What is your cycle time? Is it 30 sec? If it is, try increasing it to
1 minute.
On 12/6/11 9:12 AM, drich wrote:
As I mentioned in an earlier e-mail, I'm trying to get monit to
watch a group of processes so it can start/stop ospfd for an anycast
high availability application. However, in doing this I'm seeing
some odd behaviour that doesn't match what I expect -- is this a bug?
In the scenario below, why is it ever trying to start ospfd? If
apache is down, shouldn't ospfd stay down until apache comes back up
or is monitored again after being unmonitored? It does end up in the
correct state at the end, but not without restarting and stopping
ospfd twice in the meantime.
As an example, I have the following configured:
check process apache with pidfile /var/run/httpd.pid
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if failed host localhost port 80 protocol http
and request "/" then restart
if 2 restarts within 2 cycles then stop
check process ospfd with pidfile /var/run/quagga/ospfd.pid
start program = "/etc/init.d/ospfd start"
stop program = "/etc/init.d/ospfd stop"
depends on apache
If I make it so that apache cannot run (by removing execute
permissions on /usr/sbin/httpd) and then kill it, I see the
following in the monit logs:
Dec 6 08:47:39 tecate monit[9988]: 'apache' process is not running
Dec 6 08:47:39 tecate monit[9988]: 'apache' trying to restart
Dec 6 08:47:39 tecate monit[9988]: 'ospfd' stop: /etc/init.d/ospfd
Dec 6 08:47:39 tecate monit[9988]: 'apache' start: /etc/init.d/httpd
Dec 6 08:47:40 tecate monit[9988]: 'ospfd' unmonitor on user request
Dec 6 08:47:40 tecate monit[9988]: monit daemon at 9988 awakened
Dec 6 08:48:09 tecate monit[9988]: 'apache' failed to start
Dec 6 08:48:09 tecate monit[9988]: 'ospfd' start: /etc/init.d/ospfd
Dec 6 08:48:09 tecate monit[9988]: 'ospfd' unmonitor action done
Dec 6 08:48:09 tecate monit[9988]: Awakened by User defined signal 1
Dec 6 08:48:09 tecate monit[9988]: 'apache' process is not running
Dec 6 08:48:09 tecate monit[9988]: 'apache' trying to restart
Dec 6 08:48:09 tecate monit[9988]: 'ospfd' stop: /etc/init.d/ospfd
Dec 6 08:48:09 tecate monit[9988]: 'apache' start: /etc/init.d/httpd
Dec 6 08:48:09 tecate monit[9988]: 'ospfd' unmonitor on user request
Dec 6 08:48:09 tecate monit[9988]: monit daemon at 9988 awakened
Dec 6 08:48:39 tecate monit[9988]: 'apache' failed to start
Dec 6 08:48:39 tecate monit[9988]: 'ospfd' start: /etc/init.d/ospfd
Dec 6 08:48:39 tecate monit[9988]: 'ospfd' unmonitor action done
Dec 6 08:48:39 tecate monit[9988]: Awakened by User defined signal 1
Dec 6 08:48:39 tecate monit[9988]: 'apache' service restarted 2
times within 2 cycles(s) - stop
Dec 6 08:48:39 tecate monit[9988]: 'ospfd' stop: /etc/init.d/ospfd
Dec 6 08:48:39 tecate monit[9988]: 'ospfd' unmonitor on user request
Dec 6 08:48:39 tecate monit[9988]: monit daemon at 9988 awakened
Dec 6 08:48:39 tecate monit[9988]: Awakened by User defined signal 1
Dec 6 08:48:39 tecate monit[9988]: 'ospfd' unmonitor action done
--
Dan Rich
http://www.employees.org/~drich/ <http://www.employees.org/%7Edrich/>
/"Step up to red alert!" "Are you sure, sir?
It means changing the bulb in the sign..."/
- Red Dwarf (BBC)
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general
--
Dan Rich <[email protected]>
http://www.employees.org/~drich/ <http://www.employees.org/%7Edrich/>
/"Step up to red alert!" "Are you sure, sir?
It means changing the bulb in the sign..."/
- Red Dwarf (BBC)
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general