The problem is, that the dependency was designed primarily for critical
actions (start/stop/restart/monitor/unmonitor), where correct order is
needed.
The alert-only action doesn't trigger the dependency (action chain)
since it could be just informative.
For example if you are monitoring the icmp, you can have few error
levels, such as:
--8<--
check host myrouter with address ...
if failed icmp type echo for 3 times within 5 cycles then alert
if failed icmp type echo for 5 cycles then exec
"/script/to/power-cycle/router"
--8<--
In such case monit sends alert when the network has problems, but is not
completely dead (part of packets lost) and can recover itself yet. In
such case this shouldn't disable the monitoring of remote hosts. When
the error ratio is 100% for 5 cycles (the second icmp line), then it can
exec for example script to power-cycle the router (networked power
switch ... point-to-point or on the same ethernet switch to be reachable
if router is not available).
So, the final solution could be to extend the dependency and make the
service dependency hard by option even on alert message (to stop
monitoring the other services).
Workaround could be to define dummy start/stop methods for monitored
remote hosts and use restart action instead of alert (it sends alert as
well). Something like:
--8<--
check host myswitch ...
start program = "/bin/true"
stop program = "/bin/true"
if failed icmp type echo for 5 cycles then restart
check host myrouter ...
start program = "/bin/true"
stop program = "/bin/true"
if failed icmp type echo for 5 cycles then restart
depends on myswitch
--8<--
... not tested, but can work (although the restart action doesn't look
logical, it can trigger the dependency in this case as well).
Martin
Pablo Iranzo Gómez wrote:
List, here is the output from monit running in interactive mode with
-vv:
From log start:
-----------------------------------------------------------------------
Remote Host Name = ro5000-siNmG20876YFyCu20879
Monitoring mode = active
ICMP = if failed Echo Request count 1 with timeout 10
seconds 1 times within 1 cycle(s) then alert else if passed 1 times
within 1 cycle(s) then alert
Alert mail to = [EMAIL PROTECTED]
Alert on = All events
Alert reminder = 1 cycles
Remote Host Name = pos10.5000-siNmG20876YFyCu20879
Monitoring mode = active
Depends on Service = ro5000-siNmG20876YFyCu20879
ICMP = if failed Echo Request count 1 with timeout 10
seconds 1 times within 1 cycle(s) then alert else if passed 1 times
within 1 cycle(s) then alert
Alert mail to = [EMAIL PROTECTED]
Alert on = All events
Alert reminder = 1 cycles
From Log checking:
-----------------------------------------------------------------------
'ro5000-siNmG20876YFyCu20879' icmp ping failed
'ro5000-siNmG20876YFyCu20879' failed ICMP test [Echo Request]
ICMP failed notification is sent to [EMAIL PROTECTED]
'ro5000-siNmG20876YFyCu20879' icmp ping failed, skipping any port
connection tests
'pos10.5000-siNmG20876YFyCu20879' icmp ping failed
'pos10.5000-siNmG20876YFyCu20879' failed ICMP test [Echo Request]
ICMP failed notification is sent to [EMAIL PROTECTED]
'pos10.5000-siNmG20876YFyCu20879' icmp ping failed, skipping any port
connection tests
Config files:
-----------------------------------------------------------------------
check host ro5000-siNmG20876YFyCu20879 with address 10.39.16.1
if failed ICMP type ECHO count 1 timeout 10 seconds then alert
alert [EMAIL PROTECTED] with reminder on 1 cycle
check host pos10.5000-siNmG20876YFyCu20879 with address 10.39.16.10
if failed ICMP type ECHO count 1 timeout 10 seconds then alert
alert [EMAIL PROTECTED] with reminder on 1 cycle
depends on ro5000-siNmG20876YFyCu20879
Any hint?
Thanks in advance,
Pablo
El lun, 29-10-2007 a las 21:51 +0100, Pablo Iranzo Gómez escribió:
Martin,
On Mon, 29 Oct 2007, Martin Pala wrote:
Can you run monit in verbose mode (-v option) and send the log? You'll
see in it what happened in more detail.
Sure, will do it tomorrow early in the morning :)
If I just put "if failed icmp then alert" monit complains about
configuration (I'm using monit-4.9-1), so either I'm doing something
wrong or it's a problem with this verson.
I'm sorry - this was typo (i wrote the example just from memory so, the
"type echo" was missing).
Don't worry, I was just trying just in case I did something wrong
:)
Thanks again
Pablo
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general
------------------------------------------------------------------------
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general