> > We belive that this is due to a individual machine being down for a
> > check (and then back up again) and then a seperate machine 
> > being down
> > the next check (and so on). This is confusing mon and it 
> > thinks that it
> > failing.
> >
> > My plan on resolving this was to store failure modes at the host level 
> > inside the service, and only send out alerts when a individual host has 
> > been down for the failure time. any idea on how mon could do this?
> 
> This is confusing you not mon. If a host fails the group fails.
> If you don't want that consider one group per host.
> 

One trouble in creating one group per host/service is the shear number of
groups you end up with.  You also specify alertafter/alertevery,etc at the
host/service level.  If you specify 'alertafter 2 30m', service b should not
alert after one failure just because service a failed one time 15 minutes
ago.  Because of these, I would have to agree with the original poster that
failures should be tracked at the service/host level, and not the group
level.

Nicholas Cook
_______________________________________________
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to