Faidon Liambotis has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/130095

Change subject: Kill check_job_queue
......................................................................

Kill check_job_queue

check_job_queue has been set at ever-increasing arbitrary thresholds
since its inception while at the same time hasn't gotten smarted in its
checks and still relies on the total amount of jobs that are in the
queue.

This has historically resulted in a very high amount of false positives
that in turn has resulted in everyone regularly ignoring it, even if
it's been fired off for weeks at a time.

Meaningless CRITICAL checks are a bad habit and distract us from real
problems. Kill the check with fire, until someone comes up with a better
check that includes more heuristics.

Change-Id: I0005cb6f4913dcb4e046170fa086ef5df816522f
---
M manifests/misc/icinga.pp
M manifests/site.pp
M templates/icinga/checkcommands.cfg.erb
3 files changed, 0 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/95/130095/1

diff --git a/manifests/misc/icinga.pp b/manifests/misc/icinga.pp
index 6bfb873..2395ce0 100644
--- a/manifests/misc/icinga.pp
+++ b/manifests/misc/icinga.pp
@@ -647,23 +647,6 @@
     }
 }
 
-class icinga::monitor::jobqueue {
-    include applicationserver::packages
-
-    file {'/usr/lib/nagios/plugins/check_job_queue':
-        source => 'puppet:///files/icinga/check_job_queue',
-        owner  => 'root',
-        group  => 'root',
-        mode   => '0755',
-    }
-
-    nrpe::monitor_service { 'check_job_queue':
-        description  => 'check_job_queue',
-        nrpe_command => '/usr/lib/nagios/plugins/check_job_queue',
-        timeout      => 30,
-    }
-}
-
 class icinga::monitor::naggen {
 
     # Naggen takes exported resources from hosts and creates nagios
diff --git a/manifests/site.pp b/manifests/site.pp
index 2773139..4dfd77e 100644
--- a/manifests/site.pp
+++ b/manifests/site.pp
@@ -2327,7 +2327,6 @@
     include role::applicationserver::maintenance
     include role::db::maintenance
     include misc::deployment::scap_scripts
-    include icinga::monitor::jobqueue
     include misc::monitoring::jobqueue
     include admins::roots
     include admins::mortals
diff --git a/templates/icinga/checkcommands.cfg.erb 
b/templates/icinga/checkcommands.cfg.erb
index 391a466..5edfdbf 100644
--- a/templates/icinga/checkcommands.cfg.erb
+++ b/templates/icinga/checkcommands.cfg.erb
@@ -325,11 +325,6 @@
        }
 
 define command{
-       command_name    check_job_queue
-       command_line    $USER1$/check_job_queue
-       }
-
-define command{
        command_name    nrpe_check_raid
        command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c 
check_raid
        }

-- 
To view, visit https://gerrit.wikimedia.org/r/130095
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0005cb6f4913dcb4e046170fa086ef5df816522f
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Faidon Liambotis <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to