I think I know what is happening, but I'm not sure of how to fix it
(yet, I hope). 

Running "monit stop ospfd" is causing monit to wake up
and start processing, which then triggers the "if does not exist" in the
apache block. Is there a way to only have monit execute that block one
time or only execute it on state change? I'm assuming that "if
recovered" will only happen when the application first recovers and not
every time it is up (is that a valid assumption? it isn't in the docs
that I can find), so is there an equivalent for "if does not exist"?


Is there any choice other than creating a semaphore file and doing
something like: 

 if does not exist
 then exec "/bin/bash -c 'if [ ! -f
/tmp/monit.apachedown ]; then touch /tmp/monit.apachedown;
/usr/bin/monit stop ospfd; fi'"
 else if recovered then exec "/bin/bash
-c 'rm /tmp/monit.apachedown && /usr/bin/monit monitor ospfd'" 

My big
concern with that is getting into a state where apache is up and the
file still exists, so ospfd will not go down if apache fails. 

The good
thing about the above is that I can add the dependency statements back
to my ospfd config and that it does bring ospfd down when apache fails.


The downside is that it never runs the restart. Can you see anything
wrong with the following block that would prevent it from trying to
restart apache? If I explicitly run "monit restart apache" it will
restart, delete the semaphore and restart ospfd; but it will never do it
by itself. Does the "does not exist" check succeeding prevent the "if
failed" check from running? I don't ever see a timeout in the logs.


check process apache with pidfile /var/run/httpd.pid
 start program =
"/etc/init.d/httpd start"
 stop program = "/etc/init.d/httpd stop"
 if
does not exist
 then exec "/bin/bash -c 'if [ ! -f /tmp/monit.apachedown
]; then touch /tmp/monit.apachedown; /usr/bin/monit stop ospfd; fi'"

else if recovered then exec "/bin/bash -c 'rm /tmp/monit.apachedown &&
/usr/bin/monit monitor ospfd'"
 if failed host localhost port 80
protocol http
 and request "/" then restart
 if children > 50 then
restart
 if 2 restarts within 2 cycles then timeout
 group server

depends on tomcat 

And the log from an httpd failure says: 

Dec 13
13:18:22 tecate monit[13602]: 'apache' process is not running 
Dec 13
13:18:22 tecate monit[13602]: 'apache' exec: /bin/bash 
Dec 13 13:18:22
tecate monit[13602]: 'ospfd' stop on user request 
Dec 13 13:18:22
tecate monit[13602]: monit daemon at 13602 awakened 
Dec 13 13:18:22
tecate monit[13602]: Awakened by User defined signal 1 
Dec 13 13:18:22
tecate monit[13602]: 'ospfd' stop: /etc/init.d/ospfd 
Dec 13 13:18:22
tecate monit[13602]: 'ospfd' stop action done 
Dec 13 13:18:22 tecate
monit[13602]: 'apache' process is not running 
Dec 13 13:18:22 tecate
monit[13602]: 'apache' exec: /bin/bash 
Dec 13 13:18:22 tecate
monit[13602]: 'ospfd' unmonitor on user request 
Dec 13 13:18:22 tecate
monit[13602]: monit daemon at 13602 awakened 
Dec 13 13:18:22 tecate
monit[13602]: Awakened by User defined signal 1 
Dec 13 13:18:22 tecate
monit[13602]: 'ospfd' unmonitor action done 
Dec 13 13:18:22 tecate
monit[13602]: 'apache' process is not running 
Dec 13 13:18:22 tecate
monit[13602]: 'apache' exec: /bin/bash 
Dec 13 13:19:22 tecate
monit[13602]: 'apache' process is not running 
Dec 13 13:19:22 tecate
monit[13602]: 'apache' exec: /bin/bash 
Dec 13 13:20:22 tecate
monit[13602]: 'apache' process is not running 
Dec 13 13:20:22 tecate
monit[13602]: 'apache' exec: /bin/bash 
... which repeats until I run
monit restart apache ...

On 08.12.2011 09:11, drich wrote: 

> Eric, 
>

> That's where I started - the problem with that is that it will start
ospf every time apache fails to restart. I end up with entries in the
log like: 
> 
> Dec 6 08:47:39 tecate monit[9988]: 'apache' process is
not running 
> Dec 6 08:47:39 tecate monit[9988]: 'apache' trying to
restart 
> Dec 6 08:47:39 tecate monit[9988]: 'ospfd' stop:
/etc/init.d/ospfd 
> Dec 6 08:47:39 tecate monit[9988]: 'apache' start:
/etc/init.d/httpd 
> Dec 6 08:47:40 tecate monit[9988]: 'ospfd'
unmonitor on user request 
> Dec 6 08:47:40 tecate monit[9988]: monit
daemon at 9988 awakened 
> Dec 6 08:48:09 tecate monit[9988]: 'apache'
failed to start 
> Dec 6 08:48:09 tecate monit[9988]: 'ospfd' start:
/etc/init.d/ospfd 
> Dec 6 08:48:09 tecate monit[9988]: 'ospfd'
unmonitor action done 
> Dec 6 08:48:09 tecate monit[9988]: Awakened by
User defined signal 1 
> 
> The biggest problem is when this happens it
leaves ospfd running even if apache isn't. Martin commented that
dependencies are "soft", they define the start/stop order but don't wait
for the parent to recover before starting the dependent service. 
> 
>
I'm going to take a look at the code today, the problem I'm seeing right
now looks like a race condition. My guess is that it when I call "monit
stop ospfd" it hasn't yet marked apache as not existing, so the "if does
not exist" block is being executed again and again and again. 
> 
> Here
is the config I am working with now: 
> 
> check process apache with
pidfile /var/run/httpd.pid
> start program = "/etc/init.d/httpd start"
>
stop program = "/etc/init.d/httpd stop"
> if does not exist
> then exec
"/usr/bin/monit stop ospfd"
> else if recovered then exec
"/usr/bin/monit monitor ospfd"
> if failed host localhost port 80
protocol http
> and request "/" then restart
> if children > 50 then
restart
> if 2 restarts within 2 cycles then timeout
> group server
> 
>
check process ospfd with pidfile /var/run/quagga/ospfd.pid
> start
program = "/etc/init.d/ospfd start"
> stop program = "/etc/init.d/ospfd
stop"
> group network
> 
> On 08.12.2011 00:10, Eric Pailleau wrote: 
>

>> Hello,
>> did you simply try this ?
>> 
>> ---8 50 then restart
>>
if 2 restarts within 2 cycles then timeout
>> group server
>> depends on
tomcat
>> check process ospfd with pidfile /var/run/quagga/ospfd.pid
>>
start program = "/etc/init.d/ospfd start"
>> stop program =
"/etc/init.d/ospfd stop"
>> depends on apache
>> depends on fcserver
>>
depends on mysql
>> depends on tomcat
>> group network
>> ---8Taking out
the depends doesn't make a difference, it still stays in that loop where
it is spewing to the logs.
>> 
>> I'm off-site today, I'll look at this
more tomorrow morning when I can pay attention to it rather than to the
lecture I'm supposed to be listening to. :-)
>> 
>> On 07.12.2011 13:13,
Martin Pala wrote:
>> 
>>> Yes, it Eric is correct. The "monit stop…" in
the exec action cannot be combined in this case with the "depends on…"
>

> -- 
> 
> Dan Rich 
> http://www.employees.org/~drich/ [1]
> "Step up
to red alert!" "Are you sure, sir?
> It means changing the bulb in the
sign..."
> - Red Dwarf (BBC)

-- 

Dan Rich 

http://www.employees.org/~drich/ [2]
 "Step up to red alert!" "Are you
sure, sir?
 It means changing the bulb in the sign..."
 - Red Dwarf
(BBC)   

Links:
------
[1] http://www.employees.org/%7Edrich/
[2]
http://www.employees.org/%7Edrich/
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to