Yuvipanda has submitted this change and it was merged.

Change subject: icinga: Increase timeout for tools-checker checks
......................................................................


icinga: Increase timeout for tools-checker checks

Tools Checker is serialized - only one request might be served
at any time. This is because several of our checks are not
re-entrant, and running them in parallel would produce weird
results. However, the serverside timeout is 60s, and the client
side timeout is only 10s. This causes multiple problems:

  - Some tests do take longer, and this is ok
  - A check might be queued because a different check is taking
    longer to execute and is ahead of it in the queue, and
    icinga gives up before giving it enough time.

So this sets a 5minute client side timeout, which allows for
up to 4 long running requests ahead of a particular request in
the queue before alerting. When things fail and the tools checker
returns appropriate messages, this timeout has no effect. When
the tools checker just takes forever to respond because other
things are messed up, the 5min delay in the page is probably fine
too.

Bug: T136775
Change-Id: I4ffc0cdc8515d653e222f85f00201cbee4a19c4d
---
M modules/icinga/manifests/monitor/toollabs.pp
M modules/nagios_common/files/checkcommands.cfg
2 files changed, 6 insertions(+), 1 deletion(-)

Approvals:
  Yuvipanda: Verified; Looks good to me, approved



diff --git a/modules/icinga/manifests/monitor/toollabs.pp 
b/modules/icinga/manifests/monitor/toollabs.pp
index d90e00d..b56470a 100644
--- a/modules/icinga/manifests/monitor/toollabs.pp
+++ b/modules/icinga/manifests/monitor/toollabs.pp
@@ -45,7 +45,7 @@
     }
 
     # tests are pass/fail based on string return check
-    $checker="check_http_url_at_address_for_string!${test_entry_host}"
+    
$checker="check_http_url_at_address_for_string_with_timeout!300!${test_entry_host}"
 
     monitoring::service { 'tools-checker-self':
         description   => 'toolschecker service itself needs to return OK',
diff --git a/modules/nagios_common/files/checkcommands.cfg 
b/modules/nagios_common/files/checkcommands.cfg
index e767ec3..482b62d 100644
--- a/modules/nagios_common/files/checkcommands.cfg
+++ b/modules/nagios_common/files/checkcommands.cfg
@@ -259,6 +259,11 @@
     }
 
 define command {
+    command_name    check_http_url_at_address_for_string_with_timeout
+    command_line    $USER1$/check_http -t $ARG1$ -H $ARG2$ -I $ARG2$ -u $ARG3$ 
-s $ARG4$
+    }
+
+define command {
     command_name    check_http_url_at_address_for_string
     command_line    $USER1$/check_http -H $ARG1$ -I $ARG1$ -u $ARG2$ -s $ARG3$
     }

-- 
To view, visit https://gerrit.wikimedia.org/r/292297
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4ffc0cdc8515d653e222f85f00201cbee4a19c4d
Gerrit-PatchSet: 2
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Yuvipanda <[email protected]>
Gerrit-Reviewer: RobH <[email protected]>
Gerrit-Reviewer: Yuvipanda <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to