Over time we've been slowly modifying the code a little and adding our 
own features.

Two we've found really useful... "Ack All" to ack everything in the 
current view and a hold feature... so we can stop alerts going out for 
up to 180 mins (but still see what's failed). The hold feature includes 
who put Mon in to hold and their reason. At the end of the 180mins (or 
timeframe specified less than that) Mon automatically comes out of hold 
and the alerts automatically resume, so someone can't accidentally leave 
it on hold like we could when we stopped the scheduler (which had the 
disadvantage of not knowing what was down).

Stephane Bortzmeyer wrote:
> On Wed, Mar 12, 2008 at 12:07:38PM -0400,
>  Ed Ravin <[EMAIL PROTECTED]> wrote 
>  a message of 23 lines which said:
>
>   
>> In most cases, our engineers log into Mon and use the "host disable"
>> or "service disable" to stop montoring the stuff that's about to go
>> down, and re-enable them when the maintenance is over.
>>
>> Sometimes, we just ACK whatever's broken when Mon starts alarming.
>>     
>
> The good thing about "doing nothing when there is a planned
> maintenance" is that it allows you to test that monitoring indeed
> works.
>
> I had several times the bad experience of an undetected failure
> because the monitoring had an hidden problem.
>
> _______________________________________________
> mon mailing list
> mon@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/mon
>   


-- 
Ben Ragg - Internode - Network Operations
150 Grenfell Street, Adelaide, SA, 5000
Phone: 13NODE Web: http://www.on.net

_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to