Your solution is definitely one way to go and I think it most closely resembles the pre-probe group model. I'll toss out another method which I'm using and might work for you too: The group notifier goes to people who need to know "everything", with the sub-notifiers going to specific teams.

In my scenario I have a job queue which occasionally backs up, so JobQ length is checked by one of several probes on the database server. If a backlog develops I need to tell the admin team (so we can fix it) and the support team (so they can tell customers we're working on it if they get calls).

Our structure (simplified):

DB Server Probe Group -> Notify: Admin Team
- SNMP HR             -> Notify: NONE (Control Probe)
- IPMI/Health Check   -> Notify: NONE
- DB TCP              -> Notify: NONE
- Job Queue Length    -> Notify: Support Team


The net effect is exactly what I need: Admins get paged whenever the machine is unhappy, and the support team gets notified when the job queue backs up (but doesn't get harassed about dead fans or other stuff they can't fix), and nobody gets more than one email about any given issue.


-MG

On Sep 2, 2009, at 11:13 AM, Michael Luz wrote:

(forgive this wordy post, but as I type it out, I'm starting to figure out solutions..)

Problem:

We are monitoring a server with several different probes, and thus created a probe group for that server. We're having some issues now with duplicate notifications going out, and was wondering if you all can look at what I'm currently doing, and perhaps suggest a better way of setting it up.

Currently, I have a notification set up for (1) the probe group itself, (2) snmp resources, which is the control probe, (3) port 80, (4) port 443, and (5) port 25. When this server had problems recently, a director received separate emails for the probe group, port 80 going down, and port 443 down. I think this accurate, because the control probe (snmp) never went down, so that's why seperate alerts went out for 80 and 443, but the group probe alert seems a bit redundant.

My question is, can I reduce the number of notifications going out somehow on one device?

Do I actually need a notification on the probe group if I have notifications on each probe inside? Or... can I turn off all the probes inside, and let the probe group notification handle everything? (if the probe group is the only notifier, what if it turns red for port 80, then later 443 goes down, will another alert go out? In that instance we WANT another to go out..)

Also, regardless of the above questions, I should make sure that my control probe (usually SNMP) polls more frequently then the other probes to reduce the chance of multiple notifications going out if the server goes down... correct?

So I'm thinking of setting up my notifications as follows;

SERVER1 Probe Group - No notifications
--SERVER1 SNMP Probe - 2 minute poll, no delay on notify
--SERVER1 TCP Port 80 Probe - 2 minutes, 2 minute delay on notify
--SERVER1 TCP Port 443 - 2 minutes, 2 minute delay on notify
--SERVER1 TCP Port 25 - 2 minutes, 2 minute delay on notify

This should reduce the number of duplicates (w/ the probe group notifier) and also by having the delay on the "other" probes, it gives the control probe a chance to actually trigger first and prevent the other notifiers...

Good idea, am I on the right track??  Thanks for your input!

Michael


____________________________________________________________________
List archives:
http://www.mail-archive.com/intermapper-talk%40list.dartware.com/
To unsubscribe: send email to: [email protected]


____________________________________________________________________
List archives: http://www.mail-archive.com/intermapper-talk%40list.dartware.com/
To unsubscribe: send email to: [email protected]

Reply via email to