Faidon Liambotis has uploaded a new change for review.
https://gerrit.wikimedia.org/r/130095
Change subject: Kill check_job_queue
......................................................................
Kill check_job_queue
check_job_queue has been set at ever-increasing arbitrary thresholds
since its inception while at the same time hasn't gotten smarted in its
checks and still relies on the total amount of jobs that are in the
queue.
This has historically resulted in a very high amount of false positives
that in turn has resulted in everyone regularly ignoring it, even if
it's been fired off for weeks at a time.
Meaningless CRITICAL checks are a bad habit and distract us from real
problems. Kill the check with fire, until someone comes up with a better
check that includes more heuristics.
Change-Id: I0005cb6f4913dcb4e046170fa086ef5df816522f
---
M manifests/misc/icinga.pp
M manifests/site.pp
M templates/icinga/checkcommands.cfg.erb
3 files changed, 0 insertions(+), 23 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/operations/puppet
refs/changes/95/130095/1
diff --git a/manifests/misc/icinga.pp b/manifests/misc/icinga.pp
index 6bfb873..2395ce0 100644
--- a/manifests/misc/icinga.pp
+++ b/manifests/misc/icinga.pp
@@ -647,23 +647,6 @@
}
}
-class icinga::monitor::jobqueue {
- include applicationserver::packages
-
- file {'/usr/lib/nagios/plugins/check_job_queue':
- source => 'puppet:///files/icinga/check_job_queue',
- owner => 'root',
- group => 'root',
- mode => '0755',
- }
-
- nrpe::monitor_service { 'check_job_queue':
- description => 'check_job_queue',
- nrpe_command => '/usr/lib/nagios/plugins/check_job_queue',
- timeout => 30,
- }
-}
-
class icinga::monitor::naggen {
# Naggen takes exported resources from hosts and creates nagios
diff --git a/manifests/site.pp b/manifests/site.pp
index 2773139..4dfd77e 100644
--- a/manifests/site.pp
+++ b/manifests/site.pp
@@ -2327,7 +2327,6 @@
include role::applicationserver::maintenance
include role::db::maintenance
include misc::deployment::scap_scripts
- include icinga::monitor::jobqueue
include misc::monitoring::jobqueue
include admins::roots
include admins::mortals
diff --git a/templates/icinga/checkcommands.cfg.erb
b/templates/icinga/checkcommands.cfg.erb
index 391a466..5edfdbf 100644
--- a/templates/icinga/checkcommands.cfg.erb
+++ b/templates/icinga/checkcommands.cfg.erb
@@ -325,11 +325,6 @@
}
define command{
- command_name check_job_queue
- command_line $USER1$/check_job_queue
- }
-
-define command{
command_name nrpe_check_raid
command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c
check_raid
}
--
To view, visit https://gerrit.wikimedia.org/r/130095
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0005cb6f4913dcb4e046170fa086ef5df816522f
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Faidon Liambotis <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits