--On Thursday, August 24, 2006 08:21:16 -0500 Tim Carr <[EMAIL PROTECTED]> 
wrote:

> Hi, folks.
>
>
>
> We're going to be running mon on over 1,000 servers (each one is
> monitoring things at a remote site).  Each of these servers/sites are
> reporting in (via the "redistribute" command) to a Corporate/main
> monitoring server so we can be aware of a failure out in the remote
> site.  This corporate site will expect alerts from each server & monitor
> check (via the "traptimeout" command).  All this is currently working
> correctly.
>

Thats certainly an impressive setup...  What kind of sites are these?

CMU has three mon servers at our main campus, and one at the CMU-Qatar 
campus, running a total of 575K tests per day.  (Most of those tests on 
groups with muliple hosts, including a few groups with hundreds of hosts 
(switches, wireless access points, etc.)

The vast majority of those test generate traps to our master server, but 
you might actually be out numbering us on traps. :)


>
>
> The problem is that we're going to need to turn the monitoring period
> for several of the remote site monitors in each location way up - like
> checking every 10 seconds (i.e., "interval 10s").  That mean we're going
> to see a huge increase in the number of traps we're seeing at the
> corporate site.
>
>
>
> Is there some way to only redistribute alerts from the remote servers
> every 60 seconds, or perhaps another approach to the problem, like not
> using "redistribute"?
>
>

You could just send traps on status changes, if you're not worried about 
having the corporate server always have the latest status.  But then you 
wouldn't know if you were dropping traps.

My suggestion would be to create a modified trap.alert for use by 
redistribute, and have it maintain state about the last trap it sent for 
this group/service.  i.e. just because you run the script doesn't mean it 
has to actually send a trap.  I'd try something like:
  - If the last trap sent was a failure trap, send this trap.
  - If this trap is a failure trap, send this trap.
  - If the last trap sent was > N seconds ago, send this trap
  - Otherwise, don't send this trap.


Or we could implement a redistributeevery option, similar to alertevery. 
That wouldn't be too hard, but would take a little work.

-David

_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to