On 04/02/2015 01:26 πμ, Simon Horman wrote:
> On Tue, Feb 03, 2015 at 05:13:02PM +0100, Baptiste wrote:
>> On Tue, Feb 3, 2015 at 4:59 PM, Pavlos Parissis
>> <pavlos.paris...@gmail.com> wrote:
>>> On 01/02/2015 03:15 μμ, Willy Tarreau wrote:
>>>> Hi Simon,
>>>>
>>>> On Fri, Jan 30, 2015 at 11:22:52AM +0900, Simon Horman wrote:
>>>>> Hi Willy, Hi All,
>>>>>
>>>>> the purpose of this email is to solicit feedback on an implementation
>>>>> of email alerts for haproxy the design of which is based on a discussion
>>>>> in this forum some months ago.
>>>
>>>
>>> It would be great if we could use something like this
>>> acl low_capacity nbsrv(foo_backend) lt 2
>>> mail alert if low_capacity
>>>
>>> In some environments you only care to wake up the on-call sysadmin if you 
>>> are
>>> real troubles and not because 1-2 servers failed.
>>>
>>> Nice work,
>>> Pavlos
>>>
>>
>>
>>
>> This might be doable using monitor-uri and monitor fail directives in
>> a dedicated listen section which would fail if number of server in a
>> monitored farm goes below a threshold.
>>
>> That said, this is a dirty hack.
> 
> A agree entirely that there is a lot to be said for providing a facility
> for alert suppression and escalation. To my mind the current implementation,
> which internally works with a queue, lends itself to these kinds of
> extensions. The key question in mind is how to design advanced such
> as the one you have suggested in such a way that they can be useful in a
> wide range of use-cases.
> 
> So far there seem to be three semi-related ideas circulating
> on this list. I have added a fourth:
> 
> 1. Suppressing alerts based on priority.
>    e.g. Only send alerts for events whose priority is > x.
> 
> 2. Combining alerts into a single message.
>    e.g. If n alerts are queued up to be sent within time t
>         then send them in one message rather than n.
> 
> 3. Escalate alerts
>    e.g. Only send alerts of priority x if more than n have occurred within
>         time t.
>    This seems to be a combination of 1 and 2.
>    This may or not involve raising the priority of the resulting combined
>    alert (internally or otherwise)
> 
>    An extra qualification may be that the events need to relate to something
>    common:
>    e.g. servers of the same proxy
>         Loosing one may not be bad, loosing all of them I may wish
>       to get out of bed for
> 
> 4. Suppressing transient alerts
>    e.g. I may not care if server s goes down then comes back up again
>         within time t.
>    But I may if it keeps happening. This part seems like a variant of 3.
> 
> 
> I expect we can grow this list of use-cases. I also think things
> may become quite complex quite quickly. But it would be nice to implement
> something not overly convoluted yet useful.
> 


What you have done so far provides the basic 'monitoring' alert
functionality and it is the first step to something than can become
bigger, better but complex as you say.

The functionality you have listed, it is covered by several monitor
systems, either dummy like nagios or 'smart' which apply real-time
anomaly detection(skyline, etc) by either actively probing services or
passively receiving events.

HAProxy it is another service inside a data center which produces
events, servers go down/up, dip/spike on traffic and etc.

In small companies which can't afford to have a centralized monitor
system and prefer to just receive various e-mail from ~10 systems,
having some monitor intelligence (aggregation, alerts based on
thresholds) build-in is perfect and very much appreciated.

But, in large installation where you have 10K servers and 400 services,
you want to receive raw events without any aggregation and the 'smart'
monitor system will figure out what to do before it wakes up the on-call
sysadmin(I am on of them).

To sum up, the current data exposed over stats socket satisfies the need
of the large installation, I know that because I am quite happy with
amount of data HAProxy exposes and I work in environment where we
utilize these 'smart' monitor systems.

At my friend's start-up company which has 8 services and I don't want to
develop scripts/tools to pull info from stats socket, just mail me and I
will alter my self based on the amount of e-mails I receive, and if
HAProxy can do some kind of aggregation/threshold then my mailbox will
thank HAProxy a lot.

I hope it helps and once again thanks for your hard work,
Pavlos






Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to