Your solution is definitely one way to go and I think it most closely
resembles the pre-probe group model.
I'll toss out another method which I'm using and might work for you
too: The group notifier goes to people who need to know "everything",
with the sub-notifiers going to specific teams.
In my scenario I have a job queue which occasionally backs up, so JobQ
length is checked by one of several probes on the database server. If
a backlog develops I need to tell the admin team (so we can fix it)
and the support team (so they can tell customers we're working on it
if they get calls).
Our structure (simplified):
DB Server Probe Group -> Notify: Admin Team
- SNMP HR -> Notify: NONE (Control Probe)
- IPMI/Health Check -> Notify: NONE
- DB TCP -> Notify: NONE
- Job Queue Length -> Notify: Support Team
The net effect is exactly what I need: Admins get paged whenever the
machine is unhappy, and the support team gets notified when the job
queue backs up (but doesn't get harassed about dead fans or other
stuff they can't fix), and nobody gets more than one email about any
given issue.
-MG
On Sep 2, 2009, at 11:13 AM, Michael Luz wrote:
(forgive this wordy post, but as I type it out, I'm starting to
figure out solutions..)
Problem:
We are monitoring a server with several different probes, and thus
created a probe group for that server. We're having some issues now
with duplicate notifications going out, and was wondering if you all
can look at what I'm currently doing, and perhaps suggest a better
way of setting it up.
Currently, I have a notification set up for (1) the probe group
itself, (2) snmp resources, which is the control probe, (3) port 80,
(4) port 443, and (5) port 25. When this server had problems
recently, a director received separate emails for the probe group,
port 80 going down, and port 443 down. I think this accurate,
because the control probe (snmp) never went down, so that's why
seperate alerts went out for 80 and 443, but the group probe alert
seems a bit redundant.
My question is, can I reduce the number of notifications going out
somehow on one device?
Do I actually need a notification on the probe group if I have
notifications on each probe inside? Or... can I turn off all the
probes inside, and let the probe group notification handle
everything? (if the probe group is the only notifier, what if it
turns red for port 80, then later 443 goes down, will another alert
go out? In that instance we WANT another to go out..)
Also, regardless of the above questions, I should make sure that my
control probe (usually SNMP) polls more frequently then the other
probes to reduce the chance of multiple notifications going out if
the server goes down... correct?
So I'm thinking of setting up my notifications as follows;
SERVER1 Probe Group - No notifications
--SERVER1 SNMP Probe - 2 minute poll, no delay on notify
--SERVER1 TCP Port 80 Probe - 2 minutes, 2 minute delay on notify
--SERVER1 TCP Port 443 - 2 minutes, 2 minute delay on notify
--SERVER1 TCP Port 25 - 2 minutes, 2 minute delay on notify
This should reduce the number of duplicates (w/ the probe group
notifier) and also by having the delay on the "other" probes, it
gives the control probe a chance to actually trigger first and
prevent the other notifiers...
Good idea, am I on the right track?? Thanks for your input!
Michael
____________________________________________________________________
List archives:
http://www.mail-archive.com/intermapper-talk%40list.dartware.com/
To unsubscribe: send email to: [email protected]
____________________________________________________________________
List archives:
http://www.mail-archive.com/intermapper-talk%40list.dartware.com/
To unsubscribe: send email to: [email protected]