Giuseppe Lavagetto has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/276710

Change subject: jobrunner: monitor the HHVM server health
......................................................................

jobrunner: monitor the HHVM server health

We've been in the blind about the jobrunners not working for about a day
because there is no current way to monitor the HHVM server that is being
fed jobs from the jobqueue process. This patch introduces a simple check
that will hit HHVM and warn us in case of problems.

Change-Id: I6b99bba4eb3459f90bd27c0456308ebeedca3410
---
M modules/nagios_common/files/checkcommands.cfg
M modules/role/manifests/mediawiki/jobrunner.pp
2 files changed, 13 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/10/276710/1

diff --git a/modules/nagios_common/files/checkcommands.cfg 
b/modules/nagios_common/files/checkcommands.cfg
index ec18696..765a033 100644
--- a/modules/nagios_common/files/checkcommands.cfg
+++ b/modules/nagios_common/files/checkcommands.cfg
@@ -206,6 +206,12 @@
     command_line    $USER1$/check_http -H en.wikipedia.org -I $HOSTADDRESS$ -u 
/wiki/Main_Page
     }
 
+# 'check_http_jobrunner' command definition, querying the rpc endpoint
+define command {
+    command_name    check_http_jobrunner
+    command_line    $USER1$/check_http -I $HOSTADDRESS$ -p 9005 -u 
/rpc/RunJobs.php
+    }
+
 
 # 'check_http_upload' command definition, querying a different URL
 define command {
diff --git a/modules/role/manifests/mediawiki/jobrunner.pp 
b/modules/role/manifests/mediawiki/jobrunner.pp
index aba2a9e..8fa3df2 100644
--- a/modules/role/manifests/mediawiki/jobrunner.pp
+++ b/modules/role/manifests/mediawiki/jobrunner.pp
@@ -3,5 +3,11 @@
 
     include ::role::mediawiki::common
     include ::mediawiki::jobrunner
-}
 
+    monitoring::service { 'jobrunner_http_hhvm':
+        description   => 'HHVM jobrunner',
+        check_command => 'check_http_jobrunner',
+        retries       => 2,
+    }
+
+}

-- 
To view, visit https://gerrit.wikimedia.org/r/276710
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6b99bba4eb3459f90bd27c0456308ebeedca3410
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Giuseppe Lavagetto <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to