Am 01.11.2014 um 21:32 schrieb Darko Hojnik:
Hi there,

Since some days ago I am trying to configure notifications properly on our 
demands. Without success. I write this first for help but as an feedback also. 
Not about for bullshit storming and another kind of stupid flamewars. So I am 
asking me now are these maybe bugs, missing features, an misconfiguration? 
Should I give up? It’s Icinga2 stable and good enough for my company?

The first task is thats on 97% of our Infrastructure it’s necessary that people 
gets notified only if an service is going in an critical state and gets 
notified again even the event has been acknowledged or has gone back to the 
state OK. That sounds like an simple task but it is not.

I do understand that it's sometimes frustrating when something does not work, and noone is there to take a look or help out.

But you should also consider what others might be doing, or why there is no simple answer.

I for myself am in the middle of writing Icinga 2 documentation (yes, right now too, just because we're behind our time schedule for the 2.2 release), planning 1.12 release, and organizing a talk for OSMC in 2,5 weeks. Other Icinga developers might just have enough to do on their own.

And your problem doesn't sound simple. If I got that correctly, you're using an 8 node cluster (did you see my reply to your thread some hours ago? there was a question there too).

When a problem is not simple, it will take everyone _time_. Time to read, understand, reproduce, writing an answer you're enjoying, or getting an idea from.

Still, it's open source software, and many of us do it for fun, when not at work (still, at netways it's also fun at work, but that's different). Nagging us down or bumping questions won't help much - unless it sounds mission critical, or one finds it challenging to stumble into the problem - like the 1.x core reload problem recently.

Nevertheless, you should consider that answers may last over here.

If you think you've found a bug, then please do report one. The developers (not only me) appreciate any feedback and possible issues.

Still, the same as before applies - if noone looks into it or doesn't comment on it, it's a matter of todos and time. If you want to sponsor development time for resolving bugs - sure, just do it.



We are running FreeBSD 10 and Icinga 2.1.1 on our nodes. Our servers are almost 
running with FreeBSD.

Some I hope helpful informations.

my template,

template Notification "mail-service-notification" {
command = "mail-service-notification"

states = [ OK, Warning, Critical, Unknown, ]

types = [ Problem, Acknowledgement, Recovery, Custom,
FlappingStart, FlappingEnd,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]

period = "24x7"
}

my commands,

object NotificationCommand "mail-host-notification" {
   import "plugin-notification-command"

   command = [ SysconfDir + "/icinga2/scripts/mail-host-notification.sh" ]

   env = {
     NOTIFICATIONTYPE = "$notification.type$"
     HOSTALIAS = "$host.display_name$"
     HOSTADDRESS = "$address$"
     HOSTSTATE = "$host.state$"
     LONGDATETIME = "$icinga.long_date_time$"
     HOSTOUTPUT = "$host.output$"
     NOTIFICATIONAUTHORNAME = "$notification.author$"
     NOTIFICATIONCOMMENT = "$notification.comment$"
     HOSTDISPLAYNAME = "$host.display_name$"
     USEREMAIL = "$user.email$"
   }
}

object NotificationCommand "sms-host-notification" {
   import "plugin-notification-command"

   command = [ SysconfDir + "/icinga2/scripts/sms-host-notification.sh" ]

   env = {
     NOTIFICATIONTYPE = "$notification.type$"
     HOSTALIAS = "$host.display_name$"
     HOSTADDRESS = "$address$"
     HOSTSTATE = "$host.state$"
     LONGDATETIME = "$icinga.long_date_time$"
     SHORTDATETIME = "$icinga.short_date_time$"
     HOSTOUTPUT = "$host.output$"
     NOTIFICATIONAUTHORNAME = "$notification.author$"
     NOTIFICATIONCOMMENT = "$notification.comment$"
     HOSTDISPLAYNAME = "$host.display_name$"
     PAGER = "$user.pager$"
   }
}

object NotificationCommand "mail-service-notification" {
   import "plugin-notification-command"

   command = [ SysconfDir + "/icinga2/scripts/mail-service-notification.sh" ]

   env = {
     NOTIFICATIONTYPE = "$notification.type$"
     SERVICEDESC = "$service.name$"
     HOSTALIAS = "$host.display_name$"
     HOSTADDRESS = "$address$"
     SERVICESTATE = "$service.state$"
     LONGDATETIME = "$icinga.long_date_time$"
     SERVICEOUTPUT = "$service.output$"
     NOTIFICATIONAUTHORNAME = "$notification.author$"
     NOTIFICATIONCOMMENT = "$notification.comment$"
     HOSTDISPLAYNAME = "$host.display_name$"
     SERVICEDISPLAYNAME = "$service.display_name$"
     USEREMAIL = "$user.email$"
   }
}

object NotificationCommand "sms-service-notification" {
   import "plugin-notification-command"

   command = [ SysconfDir + "/icinga2/scripts/sms-service-notification.sh" ]

   env = {
     NOTIFICATIONTYPE = "$notification.type$"
     SERVICEDESC = "$service.name$"
     HOSTALIAS = "$host.display_name$"
     HOSTADDRESS = "$address$"
     SERVICESTATE = "$service.state$"
     LONGDATETIME = "$icinga.long_date_time$"
     SHORTDATETIME = "$icinga.short_date_time$"
     SERVICEOUTPUT = "$service.output$"
     NOTIFICATIONAUTHORNAME = "$notification.author$"
     NOTIFICATIONCOMMENT = "$notification.comment$"
     HOSTDISPLAYNAME = "$host.display_name$"
     SERVICEDISPLAYNAME = "$service.display_name$"
     PAGER = "$user.pager$"
   }
}

my inherited notifications

apply Notification "mail-service-standard-first-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical, OK ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 0
   times.begin = 3m
}

apply Notification "mail-service-standard-re-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical, OK ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 120m
   times.begin = 3m
}

So like described above notifications should been sent in stadiums critical and 
recovery. But they has been send also notifications about recovery if the state 
has been warning.

I would remove times.begin here, that doesn't make much sense regarding your example, and will certainly interfere with testing when notifications are sent.

Other than that:

Your assumption of when a recovery occurs, is wrong. There never was any difference between warning, critical and unknown - they are assumed to be NOT-OK, and will trigger a hard state change. No matter which state change increased that check attempt counter before.

So basically, a notification can even happen like this:

OK (soft) -> WARNING (1) -> CRITICAL (2) -> WARNING (3) ---> hard state change

But your users (contacts in Icinga 1/Nagios) wouldn't be notified, just because their filters don't allow Warning conditions.

In a similar fashion, a recovery is only when returning from NOT-OK to OK. Icinga 1.x and 2.x don't store information on who was notified by which state/type filter, and will take that into account upon service recovery and who to finally send a notification to.

Feel free to put or sponsor a feature request, but I don't think that this will be an easy task to implement.

Although you're not alone, that topic has been discussed for many years know, but noone fixed it (only external addons such as NoMa do, with their own filter capabilities).

So basically Icinga 1.x works like Icinga 2.x in this regard, but most likely you don't know Icinga 1.x or Nagios and haven't encountered this feature, or shortcoming. That's a per user opinion.



apply Notification "mail-service-standard-first-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 0
   times.begin = 3m
}

apply Notification "mail-service-standard-re-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 120m
   times.begin = 3m
}

described above then notifications was send with the state critical but no one 
notification with the state recovery

in conclusion this means that people get services messages with state OK even 
if they have previously seen no messages getting ahead with the status warning. 
And that’s confusing people and reduces the acceptance about icinga2.

How I could reach it, that notifications for messages should be sent in 
stadiums critical and gets an recovery message if the issue doesn't exist 
anymore. That’s one of the things that I highly want


The second issue is that notifications doesn’t have been send in the right time.


apply Notification "mail-service-standard-re-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 120m
   times.begin = 3m
}

The first notification has come later then after 30 minutes and the 
re-notification has been send also very late in my test-environment. And 
sometimes nothing has been happen.

[2014-10-30 19:50:18 +0000] notice/Notification: Not sending notifications for 
notification object 'dev3!Raid Status!mail-service-standard': before escalation 
range
[2014-10-30 19:50:18 +0000] notice/Notification: Not sending notifications for 
notification object 'dev3!Raid Status!sms-standard': before escalation range

[2014-10-30 19:55:15 +0000] information/NotificationComponent: Sending reminder 
notification for object 'dev3!Raid Status'
[2014-10-30 19:55:15 +0000] notice/Notification: Not sending notifications for 
notification object 'dev3!Raid Status!sms-standard': before escalation range

Could you explain a bit more, how these lines correlate to your observations in your test lab? The only thing I do see here is that times.begin skips the initial notification. It does not delay it though.



So my cruel crappy hack is

apply Notification "mail-service-standard-first-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 0
   times.begin = 3m
}

apply Notification "mail-service-standard-re-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 10m
   times.begin = 3m
}

that forces to send an notification directly and every 10 minutes after. Is 
this the really way to configure notifications at the right time?

As said, I would remove the escalation range (times.begin), keep in mind that not-ok states will always generate recovery noticiations, and then test again.

Kind regards,
Michael



thank you all for any help, hints or suggestions.


cheers

Darko Hojnik
Datacenter Operations

[email protected]
www.trivago.com


Court of Registration: Amtsgericht Duesseldorf, Registration Number: HRB 51842
Managing directors: Rolf Schroemgens • Malte Siewert • Peter Vinnemeier
trivago GmbH • Bennigsen-Platz 1 • 40474 Duesseldorf, Germany
* This email message may contain legally privileged and/or confidential 
information.
You are hereby notified that any disclosure, copying, distribution, or use of 
this email message is strictly prohibited.

_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users



--
DI (FH) Michael Friedrich

[email protected]  || icinga open source monitoring
https://twitter.com/dnsmichi || lead core developer
[email protected]       || https://www.icinga.org/team
irc.freenode.net/icinga      || dnsmichi
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to