Hi guys,
I run a icinga2 cluster with 4 nodes(2 master, 2 checker), and the
scheduling behavior is quite strange!
See my config below. The test-fail service state jumped from 1/5 SOFT, to
1/5 HARD, which should be 1/5 SOFT -> 2/5 SOFT -> ... 5/5 SOFT -> 5/5 HARD.
And the notification of test-fail-10 is late. The HARD alert is at
1456232652, but the notification is at 1456234216, which is the same time
with the second time of the test-fail notification.
# service.conf
apply Service "test-fail" {
max_check_attempts = 5
check_interval = 1m
retry_interval = 30s
check_command = "always-fail"
assign where host.name == "carl2"
}
apply Service "test-fail-10" {
max_check_attempts = 3
check_interval = 10m
retry_interval = 30s
check_command = "always-fail"
assign where host.name == "carl2"
}
# zones.conf
object Endpoint "sindar33a.intra.douban.com" {
host = "sindar33a"
}
object Endpoint "sindar33b.intra.douban.com" {
host = "sindar33b"
}
object Endpoint "sindar33c.intra.douban.com" {
host = "sindar33c"
}
object Endpoint "sindar33d.intra.douban.com" {
host = "sindar33d"
}
object Zone "master" {
endpoints = [
"sindar33a.intra.douban.com",
"sindar33b.intra.douban.com",
]
}
object Zone "checker" {
endpoints = [
"sindar33c.intra.douban.com",
"sindar33d.intra.douban.com",
],
parent = "master"
}
admin@sindar33a ~ $ tail -F /var/log/icinga2/compat/icinga.log |
grep 'carl2;test'
[1456232407] CURRENT SERVICE STATE: carl2;test-fail;UNKNOWN;SOFT;1;
[1456232407] CURRENT SERVICE STATE: carl2;test-fail-10;UNKNOWN;SOFT;1;
[1456232413] SERVICE ALERT: carl2;test-fail;WARNING;HARD;1;Traceback
(most recent call last):
[1456232652] SERVICE ALERT:
carl2;test-fail-10;WARNING;HARD;1;Traceback (most recent call last):
[1456234216] SERVICE NOTIFICATION:
lihan-test;carl2;test-fail;WARNING;mail-service-notification;Traceback
(most recent call last):;
[1456234216] SERVICE NOTIFICATION:
lihan-test;carl2;test-fail-10;WARNING;mail-service-notification;Traceback
(most recent call last):;
admin@sindar33b ~ $ tail -F /var/log/icinga2/compat/icinga.log |
grep 'carl2;test'
[1456232410] CURRENT SERVICE STATE: carl2;test-fail;UNKNOWN;SOFT;1;
[1456232410] CURRENT SERVICE STATE: carl2;test-fail-10;UNKNOWN;SOFT;1;
[1456232413] SERVICE ALERT: carl2;test-fail;WARNING;HARD;1;Traceback
(most recent call last):
[1456232415] SERVICE NOTIFICATION:
admin-test;carl2;test-fail;WARNING;mail-service-notification;Traceback
(most recent call last):;
[1456232652] SERVICE ALERT:
carl2;test-fail-10;WARNING;HARD;1;Traceback (most recent call last):
admin@sindar33c ~ $ tail -F /var/log/icinga2/compat/icinga.log |
grep 'carl2;test'
[1456232409] CURRENT SERVICE STATE: carl2;test-fail;UNKNOWN;SOFT;1;
[1456232409] CURRENT SERVICE STATE: carl2;test-fail-10;UNKNOWN;SOFT;1;
[1456232413] SERVICE ALERT: carl2;test-fail;WARNING;HARD;1;Traceback
(most recent call last):
[1456232652] SERVICE ALERT:
carl2;test-fail-10;WARNING;HARD;1;Traceback (most recent call last):
admin@sindar33d ~ $ tail -F /var/log/icinga2/compat/icinga.log |
grep 'carl2;test'
[1456232408] CURRENT SERVICE STATE: carl2;test-fail;UNKNOWN;SOFT;1;
[1456232408] CURRENT SERVICE STATE: carl2;test-fail-10;UNKNOWN;SOFT;1;
[1456232413] SERVICE ALERT: carl2;test-fail;WARNING;HARD;1;Traceback
(most recent call last):
[1456232652] SERVICE ALERT:
carl2;test-fail-10;WARNING;HARD;1;Traceback (most recent call last):
Thanks in advance for your help!
Regards
--
Harry Lee | SA Dept. | Douban Inc.
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users