On Tue, Feb 03, 2015 at 05:13:02PM +0100, Baptiste wrote:
> On Tue, Feb 3, 2015 at 4:59 PM, Pavlos Parissis
> <[email protected]> wrote:
> > On 01/02/2015 03:15 μμ, Willy Tarreau wrote:
> >> Hi Simon,
> >>
> >> On Fri, Jan 30, 2015 at 11:22:52AM +0900, Simon Horman wrote:
> >>> Hi Willy, Hi All,
> >>>
> >>> the purpose of this email is to solicit feedback on an implementation
> >>> of email alerts for haproxy the design of which is based on a discussion
> >>> in this forum some months ago.
> >
> >
> > It would be great if we could use something like this
> > acl low_capacity nbsrv(foo_backend) lt 2
> > mail alert if low_capacity
> >
> > In some environments you only care to wake up the on-call sysadmin if you
> > are
> > real troubles and not because 1-2 servers failed.
> >
> > Nice work,
> > Pavlos
> >
>
>
>
> This might be doable using monitor-uri and monitor fail directives in
> a dedicated listen section which would fail if number of server in a
> monitored farm goes below a threshold.
>
> That said, this is a dirty hack.
A agree entirely that there is a lot to be said for providing a facility
for alert suppression and escalation. To my mind the current implementation,
which internally works with a queue, lends itself to these kinds of
extensions. The key question in mind is how to design advanced such
as the one you have suggested in such a way that they can be useful in a
wide range of use-cases.
So far there seem to be three semi-related ideas circulating
on this list. I have added a fourth:
1. Suppressing alerts based on priority.
e.g. Only send alerts for events whose priority is > x.
2. Combining alerts into a single message.
e.g. If n alerts are queued up to be sent within time t
then send them in one message rather than n.
3. Escalate alerts
e.g. Only send alerts of priority x if more than n have occurred within
time t.
This seems to be a combination of 1 and 2.
This may or not involve raising the priority of the resulting combined
alert (internally or otherwise)
An extra qualification may be that the events need to relate to something
common:
e.g. servers of the same proxy
Loosing one may not be bad, loosing all of them I may wish
to get out of bed for
4. Suppressing transient alerts
e.g. I may not care if server s goes down then comes back up again
within time t.
But I may if it keeps happening. This part seems like a variant of 3.
I expect we can grow this list of use-cases. I also think things
may become quite complex quite quickly. But it would be nice to implement
something not overly convoluted yet useful.