Hello, all.
Hoping someone has encountered something similar and already has a solution. I
am trying to find the most straight forward method of allowing a higher timeout
value for commands issued against a limited set of hosts or devices that are
known to have slow connections. If I just set the “timeout = “ value high
enough for the slowest connections in the command definitions, that really is 2
or 3 times as long as I would need or want for most of the hosts I monitor.
Without this, what we live with is a set of hosts that generate “white noise”
by frequently showing as down or with failed service checks – only to reset
themselves – or else increase the timeouts and/or number of failed checks to
the point where I am ignoring legitimate problems too long.
What occurs to me is to have two entirely separate command definitions, named
for example “my_command” and “my_command_slow” – and define the service checks
for the high latency hosts to use the “slow” commands. This seems very
primitive and prone to errors – IE admins adding or changing a command and it
being overlooked for the “slow” hosts. It would look something like:
object CheckCommand "my_check" {
command = ["/path/to/command/command"]
timeout = 45 }
object CheckCommand "my_check_slow" {
command = ["/path/to/command/command"]
timeout = 180 }
apply service “my_service” {
assign where match (conditions_for_normal_hosts)
check_command = “my_check” }
apply service “my_service_slow” {
assign where match (conditions_for_slow_hosts)
check_command = “my_check_slow” }
Not pretty.
What would seem to make more sense would be to be able to define a timeout
value for the service check itself, so that when I write my apply statement to
perform a given check on the servers I tagged as “slow”, that it sets a timeout
value. However so far this does not seem to work; timeout does not seem to be a
valid attribute for a service.
object CheckCommand "my_check" {
command = ["/path/to/command/command"] }
apply service “my_service” {
assign where match (conditions_for_normal_hosts)
check_command = “my_check”
timeout = 45
}
apply service “my_service_slow” {
assign where match (conditions_for_slow_hosts)
check_command = “my_check”
timeout = 180
}
Suggestions / experience?
Cheers
Jay
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users