Am 01.11.2014 um 21:32 schrieb Darko Hojnik:
Hi there,
Since some days ago I am trying to configure notifications properly on our
demands. Without success. I write this first for help but as an feedback also.
Not about for bullshit storming and another kind of stupid flamewars. So I am
asking me now are these maybe bugs, missing features, an misconfiguration?
Should I give up? It’s Icinga2 stable and good enough for my company?
The first task is thats on 97% of our Infrastructure it’s necessary that people
gets notified only if an service is going in an critical state and gets
notified again even the event has been acknowledged or has gone back to the
state OK. That sounds like an simple task but it is not.
I do understand that it's sometimes frustrating when something does not
work, and noone is there to take a look or help out.
But you should also consider what others might be doing, or why there is
no simple answer.
I for myself am in the middle of writing Icinga 2 documentation (yes,
right now too, just because we're behind our time schedule for the 2.2
release), planning 1.12 release, and organizing a talk for OSMC in 2,5
weeks. Other Icinga developers might just have enough to do on their own.
And your problem doesn't sound simple. If I got that correctly, you're
using an 8 node cluster (did you see my reply to your thread some hours
ago? there was a question there too).
When a problem is not simple, it will take everyone _time_. Time to
read, understand, reproduce, writing an answer you're enjoying, or
getting an idea from.
Still, it's open source software, and many of us do it for fun, when not
at work (still, at netways it's also fun at work, but that's different).
Nagging us down or bumping questions won't help much - unless it sounds
mission critical, or one finds it challenging to stumble into the
problem - like the 1.x core reload problem recently.
Nevertheless, you should consider that answers may last over here.
If you think you've found a bug, then please do report one. The
developers (not only me) appreciate any feedback and possible issues.
Still, the same as before applies - if noone looks into it or doesn't
comment on it, it's a matter of todos and time. If you want to sponsor
development time for resolving bugs - sure, just do it.
We are running FreeBSD 10 and Icinga 2.1.1 on our nodes. Our servers are almost
running with FreeBSD.
Some I hope helpful informations.
my template,
template Notification "mail-service-notification" {
command = "mail-service-notification"
states = [ OK, Warning, Critical, Unknown, ]
types = [ Problem, Acknowledgement, Recovery, Custom,
FlappingStart, FlappingEnd,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]
period = "24x7"
}
my commands,
object NotificationCommand "mail-host-notification" {
import "plugin-notification-command"
command = [ SysconfDir + "/icinga2/scripts/mail-host-notification.sh" ]
env = {
NOTIFICATIONTYPE = "$notification.type$"
HOSTALIAS = "$host.display_name$"
HOSTADDRESS = "$address$"
HOSTSTATE = "$host.state$"
LONGDATETIME = "$icinga.long_date_time$"
HOSTOUTPUT = "$host.output$"
NOTIFICATIONAUTHORNAME = "$notification.author$"
NOTIFICATIONCOMMENT = "$notification.comment$"
HOSTDISPLAYNAME = "$host.display_name$"
USEREMAIL = "$user.email$"
}
}
object NotificationCommand "sms-host-notification" {
import "plugin-notification-command"
command = [ SysconfDir + "/icinga2/scripts/sms-host-notification.sh" ]
env = {
NOTIFICATIONTYPE = "$notification.type$"
HOSTALIAS = "$host.display_name$"
HOSTADDRESS = "$address$"
HOSTSTATE = "$host.state$"
LONGDATETIME = "$icinga.long_date_time$"
SHORTDATETIME = "$icinga.short_date_time$"
HOSTOUTPUT = "$host.output$"
NOTIFICATIONAUTHORNAME = "$notification.author$"
NOTIFICATIONCOMMENT = "$notification.comment$"
HOSTDISPLAYNAME = "$host.display_name$"
PAGER = "$user.pager$"
}
}
object NotificationCommand "mail-service-notification" {
import "plugin-notification-command"
command = [ SysconfDir + "/icinga2/scripts/mail-service-notification.sh" ]
env = {
NOTIFICATIONTYPE = "$notification.type$"
SERVICEDESC = "$service.name$"
HOSTALIAS = "$host.display_name$"
HOSTADDRESS = "$address$"
SERVICESTATE = "$service.state$"
LONGDATETIME = "$icinga.long_date_time$"
SERVICEOUTPUT = "$service.output$"
NOTIFICATIONAUTHORNAME = "$notification.author$"
NOTIFICATIONCOMMENT = "$notification.comment$"
HOSTDISPLAYNAME = "$host.display_name$"
SERVICEDISPLAYNAME = "$service.display_name$"
USEREMAIL = "$user.email$"
}
}
object NotificationCommand "sms-service-notification" {
import "plugin-notification-command"
command = [ SysconfDir + "/icinga2/scripts/sms-service-notification.sh" ]
env = {
NOTIFICATIONTYPE = "$notification.type$"
SERVICEDESC = "$service.name$"
HOSTALIAS = "$host.display_name$"
HOSTADDRESS = "$address$"
SERVICESTATE = "$service.state$"
LONGDATETIME = "$icinga.long_date_time$"
SHORTDATETIME = "$icinga.short_date_time$"
SERVICEOUTPUT = "$service.output$"
NOTIFICATIONAUTHORNAME = "$notification.author$"
NOTIFICATIONCOMMENT = "$notification.comment$"
HOSTDISPLAYNAME = "$host.display_name$"
SERVICEDISPLAYNAME = "$service.display_name$"
PAGER = "$user.pager$"
}
}
my inherited notifications
apply Notification "mail-service-standard-first-notification" to Service {
import "mail-service-notification"
user_groups = [ "trivago-admins-email" ]
assign where service.vars.sla == "24x7"
states = [ Critical, OK ]
types = [ Problem, Acknowledgement, Recovery, Custom,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]
period = "24x7"
interval = 0
times.begin = 3m
}
apply Notification "mail-service-standard-re-notification" to Service {
import "mail-service-notification"
user_groups = [ "trivago-admins-email" ]
assign where service.vars.sla == "24x7"
states = [ Critical, OK ]
types = [ Problem, Acknowledgement, Recovery, Custom,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]
period = "24x7"
interval = 120m
times.begin = 3m
}
So like described above notifications should been sent in stadiums critical and
recovery. But they has been send also notifications about recovery if the state
has been warning.
I would remove times.begin here, that doesn't make much sense regarding
your example, and will certainly interfere with testing when
notifications are sent.
Other than that:
Your assumption of when a recovery occurs, is wrong. There never was any
difference between warning, critical and unknown - they are assumed to
be NOT-OK, and will trigger a hard state change. No matter which state
change increased that check attempt counter before.
So basically, a notification can even happen like this:
OK (soft) -> WARNING (1) -> CRITICAL (2) -> WARNING (3) ---> hard state
change
But your users (contacts in Icinga 1/Nagios) wouldn't be notified, just
because their filters don't allow Warning conditions.
In a similar fashion, a recovery is only when returning from NOT-OK to
OK. Icinga 1.x and 2.x don't store information on who was notified by
which state/type filter, and will take that into account upon service
recovery and who to finally send a notification to.
Feel free to put or sponsor a feature request, but I don't think that
this will be an easy task to implement.
Although you're not alone, that topic has been discussed for many years
know, but noone fixed it (only external addons such as NoMa do, with
their own filter capabilities).
So basically Icinga 1.x works like Icinga 2.x in this regard, but most
likely you don't know Icinga 1.x or Nagios and haven't encountered this
feature, or shortcoming. That's a per user opinion.
apply Notification "mail-service-standard-first-notification" to Service {
import "mail-service-notification"
user_groups = [ "trivago-admins-email" ]
assign where service.vars.sla == "24x7"
states = [ Critical ]
types = [ Problem, Acknowledgement, Recovery, Custom,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]
period = "24x7"
interval = 0
times.begin = 3m
}
apply Notification "mail-service-standard-re-notification" to Service {
import "mail-service-notification"
user_groups = [ "trivago-admins-email" ]
assign where service.vars.sla == "24x7"
states = [ Critical ]
types = [ Problem, Acknowledgement, Recovery, Custom,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]
period = "24x7"
interval = 120m
times.begin = 3m
}
described above then notifications was send with the state critical but no one
notification with the state recovery
in conclusion this means that people get services messages with state OK even
if they have previously seen no messages getting ahead with the status warning.
And that’s confusing people and reduces the acceptance about icinga2.
How I could reach it, that notifications for messages should be sent in
stadiums critical and gets an recovery message if the issue doesn't exist
anymore. That’s one of the things that I highly want
The second issue is that notifications doesn’t have been send in the right time.
apply Notification "mail-service-standard-re-notification" to Service {
import "mail-service-notification"
user_groups = [ "trivago-admins-email" ]
assign where service.vars.sla == "24x7"
states = [ Critical ]
types = [ Problem, Acknowledgement, Recovery, Custom,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]
period = "24x7"
interval = 120m
times.begin = 3m
}
The first notification has come later then after 30 minutes and the
re-notification has been send also very late in my test-environment. And
sometimes nothing has been happen.
[2014-10-30 19:50:18 +0000] notice/Notification: Not sending notifications for
notification object 'dev3!Raid Status!mail-service-standard': before escalation
range
[2014-10-30 19:50:18 +0000] notice/Notification: Not sending notifications for
notification object 'dev3!Raid Status!sms-standard': before escalation range
[2014-10-30 19:55:15 +0000] information/NotificationComponent: Sending reminder
notification for object 'dev3!Raid Status'
[2014-10-30 19:55:15 +0000] notice/Notification: Not sending notifications for
notification object 'dev3!Raid Status!sms-standard': before escalation range
Could you explain a bit more, how these lines correlate to your
observations in your test lab? The only thing I do see here is that
times.begin skips the initial notification. It does not delay it though.
So my cruel crappy hack is
apply Notification "mail-service-standard-first-notification" to Service {
import "mail-service-notification"
user_groups = [ "trivago-admins-email" ]
assign where service.vars.sla == "24x7"
states = [ Critical ]
types = [ Problem, Acknowledgement, Recovery, Custom,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]
period = "24x7"
interval = 0
times.begin = 3m
}
apply Notification "mail-service-standard-re-notification" to Service {
import "mail-service-notification"
user_groups = [ "trivago-admins-email" ]
assign where service.vars.sla == "24x7"
states = [ Critical ]
types = [ Problem, Acknowledgement, Recovery, Custom,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]
period = "24x7"
interval = 10m
times.begin = 3m
}
that forces to send an notification directly and every 10 minutes after. Is
this the really way to configure notifications at the right time?
As said, I would remove the escalation range (times.begin), keep in mind
that not-ok states will always generate recovery noticiations, and then
test again.
Kind regards,
Michael
thank you all for any help, hints or suggestions.
cheers
Darko Hojnik
Datacenter Operations
[email protected]
www.trivago.com
Court of Registration: Amtsgericht Duesseldorf, Registration Number: HRB 51842
Managing directors: Rolf Schroemgens • Malte Siewert • Peter Vinnemeier
trivago GmbH • Bennigsen-Platz 1 • 40474 Duesseldorf, Germany
* This email message may contain legally privileged and/or confidential
information.
You are hereby notified that any disclosure, copying, distribution, or use of
this email message is strictly prohibited.
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users
--
DI (FH) Michael Friedrich
[email protected] || icinga open source monitoring
https://twitter.com/dnsmichi || lead core developer
[email protected] || https://www.icinga.org/team
irc.freenode.net/icinga || dnsmichi
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users