Ok, more details:

Here's the stripped down test config:

-------------------------------------------
basedir      = /usr/local/mon
cfbasedir    = /usr/local/mon/etc
alertdir     = /usr/local/mon/alert
mondir       = /usr/local/mon/monitors
statedir     = /usr/local/mon/run
logdir       = /usr/local/mon/log
authfile     = /usr/local/mon/etc/auth.cf
userfile     = /usr/local/mon/etc/mon.passwd
pidfile      = /usr/local/mon/run/mon.pid
maxprocs     = 20
histlength   = 100
dep_behavior = a
                # m == suppress monitor, a == suppress alert
dtlogging     = yes
dtlogfile     = /usr/local/mon/run/dtlog
#randstart     = 60s

authtype = userfile

####################

hostgroup TEST1 TEST1

watch TEST1
        service TEST1
                depend SELF:always_fail
                monitor always_fail ;;
                interval 10s
                period mail: wd {Sun-Sat}
                        alertevery 5m
                        alert TEST1.action
        service always_fail
                monitor always_fail
                interval 5m

####################
-------------------------------------------

'always_fail' looks like this:

-------------------------------------------
#!/bin/csh

echo "Woe, I have failed"
exit 1
-------------------------------------------

When I try and test it, I get this:

-------------------------------------------
    basedir     [/usr/local/mon]
    cfbasedir   [/usr/local/mon/etc]

    cf          [/usr/local/mon/etc/mon.cf]
    statedir    [/usr/local/mon/run]
    logdir      [/usr/local/mon/log]
    authfile    [/usr/local/mon/etc/auth.cf]
    ocfile      [/usr/local/mon/etc/oncall.cf]
    userfile    [/usr/local/mon/etc/mon.passwd]
    dtlogfile   [/usr/local/mon/run/dtlog]
    historicfile[]
    monerrfile  [/dev/null]
    scriptdir   [/usr/local/mon/monitors]
    alertdir    [/usr/local/mon/alert]
M always_fail=[/usr/local/mon/monitors/always_fail]
A TEST1.action=[/usr/local/mon/alert/TEST1.action]
0
1
2
3
4
5
6
7
8
9
10
watching file handle 7 for TEST1/TEST1
select returned 0 file handles
11
select returned 1 file handles
[Woe, I have failed
] from FileHandle=GLOB(0x1f15dc)
EOF on FileHandle=GLOB(0x1f15dc)
PID 8248 (TEST1/TEST1) exited with [1]
checking DEP [TEST1:always_fail]
  found root dep TEST1,always_fail
  (TEST1,always_fail) 0 depend=[1][TEST1:always_fail]  depend=[1]
  before eval: [1]  after eval: [1]
12
-------------------------------------------

... and the alert for TEST1/TEST1 fires.
That doesn't seem quite right, especialy since if I have the dependency
TEST1/always_fail run first (by switching its interval to 5s, not 5m, 
I get this:

-------------------------------------------
    basedir     [/usr/local/mon]
    cfbasedir   [/usr/local/mon/etc]

    cf          [/usr/local/mon/etc/mon.cf]
    statedir    [/usr/local/mon/run]
    logdir      [/usr/local/mon/log]
    authfile    [/usr/local/mon/etc/auth.cf]
    ocfile      [/usr/local/mon/etc/oncall.cf]
    userfile    [/usr/local/mon/etc/mon.passwd]
    dtlogfile   [/usr/local/mon/run/dtlog]
    historicfile[]
    monerrfile  [/dev/null]
    scriptdir   [/usr/local/mon/monitors]
    alertdir    [/usr/local/mon/alert]
M always_fail=[/usr/local/mon/monitors/always_fail]
A TEST1.action=[/usr/local/mon/alert/TEST1.action]
0
1
2
3
4
5
watching file handle 7 for TEST1/always_fail
select returned 0 file handles
6
select returned 1 file handles
[Woe, I have failed
] from FileHandle=GLOB(0x49ae30)
EOF on FileHandle=GLOB(0x49ae30)
PID 8274 (TEST1/always_fail) exited with [1]
7
8
9
10
watching file handle 7 for TEST1/TEST1
select returned 0 file handles
11
watching file handle 8 for TEST1/always_fail
select returned 1 file handles
[Woe, I have failed
] from FileHandle=GLOB(0x49afd4)
EOF on FileHandle=GLOB(0x49afd4)
PID 8275 (TEST1/TEST1) exited with [1]
checking DEP [TEST1:always_fail]
  found root dep TEST1,always_fail
  (TEST1,always_fail) 0 depend=[0][TEST1:always_fail]  depend=[0]
  before eval: [0]  after eval: [0]
alert for TEST1,TEST1 supressed because of dep fail
12
-------------------------------------------

What am I doing wrong?

-Luke

Reply via email to