Ok, more details:
Here's the stripped down test config:
-------------------------------------------
basedir = /usr/local/mon
cfbasedir = /usr/local/mon/etc
alertdir = /usr/local/mon/alert
mondir = /usr/local/mon/monitors
statedir = /usr/local/mon/run
logdir = /usr/local/mon/log
authfile = /usr/local/mon/etc/auth.cf
userfile = /usr/local/mon/etc/mon.passwd
pidfile = /usr/local/mon/run/mon.pid
maxprocs = 20
histlength = 100
dep_behavior = a
# m == suppress monitor, a == suppress alert
dtlogging = yes
dtlogfile = /usr/local/mon/run/dtlog
#randstart = 60s
authtype = userfile
####################
hostgroup TEST1 TEST1
watch TEST1
service TEST1
depend SELF:always_fail
monitor always_fail ;;
interval 10s
period mail: wd {Sun-Sat}
alertevery 5m
alert TEST1.action
service always_fail
monitor always_fail
interval 5m
####################
-------------------------------------------
'always_fail' looks like this:
-------------------------------------------
#!/bin/csh
echo "Woe, I have failed"
exit 1
-------------------------------------------
When I try and test it, I get this:
-------------------------------------------
basedir [/usr/local/mon]
cfbasedir [/usr/local/mon/etc]
cf [/usr/local/mon/etc/mon.cf]
statedir [/usr/local/mon/run]
logdir [/usr/local/mon/log]
authfile [/usr/local/mon/etc/auth.cf]
ocfile [/usr/local/mon/etc/oncall.cf]
userfile [/usr/local/mon/etc/mon.passwd]
dtlogfile [/usr/local/mon/run/dtlog]
historicfile[]
monerrfile [/dev/null]
scriptdir [/usr/local/mon/monitors]
alertdir [/usr/local/mon/alert]
M always_fail=[/usr/local/mon/monitors/always_fail]
A TEST1.action=[/usr/local/mon/alert/TEST1.action]
0
1
2
3
4
5
6
7
8
9
10
watching file handle 7 for TEST1/TEST1
select returned 0 file handles
11
select returned 1 file handles
[Woe, I have failed
] from FileHandle=GLOB(0x1f15dc)
EOF on FileHandle=GLOB(0x1f15dc)
PID 8248 (TEST1/TEST1) exited with [1]
checking DEP [TEST1:always_fail]
found root dep TEST1,always_fail
(TEST1,always_fail) 0 depend=[1][TEST1:always_fail] depend=[1]
before eval: [1] after eval: [1]
12
-------------------------------------------
... and the alert for TEST1/TEST1 fires.
That doesn't seem quite right, especialy since if I have the dependency
TEST1/always_fail run first (by switching its interval to 5s, not 5m,
I get this:
-------------------------------------------
basedir [/usr/local/mon]
cfbasedir [/usr/local/mon/etc]
cf [/usr/local/mon/etc/mon.cf]
statedir [/usr/local/mon/run]
logdir [/usr/local/mon/log]
authfile [/usr/local/mon/etc/auth.cf]
ocfile [/usr/local/mon/etc/oncall.cf]
userfile [/usr/local/mon/etc/mon.passwd]
dtlogfile [/usr/local/mon/run/dtlog]
historicfile[]
monerrfile [/dev/null]
scriptdir [/usr/local/mon/monitors]
alertdir [/usr/local/mon/alert]
M always_fail=[/usr/local/mon/monitors/always_fail]
A TEST1.action=[/usr/local/mon/alert/TEST1.action]
0
1
2
3
4
5
watching file handle 7 for TEST1/always_fail
select returned 0 file handles
6
select returned 1 file handles
[Woe, I have failed
] from FileHandle=GLOB(0x49ae30)
EOF on FileHandle=GLOB(0x49ae30)
PID 8274 (TEST1/always_fail) exited with [1]
7
8
9
10
watching file handle 7 for TEST1/TEST1
select returned 0 file handles
11
watching file handle 8 for TEST1/always_fail
select returned 1 file handles
[Woe, I have failed
] from FileHandle=GLOB(0x49afd4)
EOF on FileHandle=GLOB(0x49afd4)
PID 8275 (TEST1/TEST1) exited with [1]
checking DEP [TEST1:always_fail]
found root dep TEST1,always_fail
(TEST1,always_fail) 0 depend=[0][TEST1:always_fail] depend=[0]
before eval: [0] after eval: [0]
alert for TEST1,TEST1 supressed because of dep fail
12
-------------------------------------------
What am I doing wrong?
-Luke