Hi, folks.

 

We’re going to be running mon on over 1,000 servers (each one is monitoring things at a remote site).  Each of these servers/sites are reporting in (via the “redistribute” command) to a Corporate/main monitoring server so we can be aware of a failure out in the remote site.  This corporate site will expect alerts from each server & monitor check (via the “traptimeout” command).  All this is currently working correctly.

 

The problem is that we’re going to need to turn the monitoring period for several of the remote site monitors in each location way up – like checking every 10 seconds (i.e., “interval 10s”).  That mean we’re going to see a huge increase in the number of traps we’re seeing at the corporate site. 

 

Is there some way to only redistribute alerts from the remote servers every 60 seconds, or perhaps another approach to the problem, like not using “redistribute”?

 

 

 

Remote site configuration example:

 

watch BRANCH_SERVER

    service DRBD

        interval 1m

        monitor DRBDCheck.monitor -s me

        description Is my DRBD working?

        redistribute trap.alert mainmonitor

        period wd {Mon-Sun}

           alert trap.alert mainmonitor

           upalert trap.alert mainmonitor

    service My_HB

        interval 1m

        monitor HACheck.monitor -s me

        description Is my heartbeat active?

        redistribute trap.alert mainmonitor

        period wd {Mon-Sun}

           alert trap.alert mainmonitor

           upalert trap.alert mainmonitor

 

 

Corporate site configuration example:

 

    service DRBD

        description Is my DRBD working?

        traptimeout 2m

    service My_HB

        description Is my heartbeat active?

        traptimeout 2m

 

 

Thanks,

Tim

 

_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to