Giuseppe Lavagetto has uploaded a new change for review.
https://gerrit.wikimedia.org/r/276710
Change subject: jobrunner: monitor the HHVM server health
......................................................................
jobrunner: monitor the HHVM server health
We've been in the blind about the jobrunners not working for about a day
because there is no current way to monitor the HHVM server that is being
fed jobs from the jobqueue process. This patch introduces a simple check
that will hit HHVM and warn us in case of problems.
Change-Id: I6b99bba4eb3459f90bd27c0456308ebeedca3410
---
M modules/nagios_common/files/checkcommands.cfg
M modules/role/manifests/mediawiki/jobrunner.pp
2 files changed, 13 insertions(+), 1 deletion(-)
git pull ssh://gerrit.wikimedia.org:29418/operations/puppet
refs/changes/10/276710/1
diff --git a/modules/nagios_common/files/checkcommands.cfg
b/modules/nagios_common/files/checkcommands.cfg
index ec18696..765a033 100644
--- a/modules/nagios_common/files/checkcommands.cfg
+++ b/modules/nagios_common/files/checkcommands.cfg
@@ -206,6 +206,12 @@
command_line $USER1$/check_http -H en.wikipedia.org -I $HOSTADDRESS$ -u
/wiki/Main_Page
}
+# 'check_http_jobrunner' command definition, querying the rpc endpoint
+define command {
+ command_name check_http_jobrunner
+ command_line $USER1$/check_http -I $HOSTADDRESS$ -p 9005 -u
/rpc/RunJobs.php
+ }
+
# 'check_http_upload' command definition, querying a different URL
define command {
diff --git a/modules/role/manifests/mediawiki/jobrunner.pp
b/modules/role/manifests/mediawiki/jobrunner.pp
index aba2a9e..8fa3df2 100644
--- a/modules/role/manifests/mediawiki/jobrunner.pp
+++ b/modules/role/manifests/mediawiki/jobrunner.pp
@@ -3,5 +3,11 @@
include ::role::mediawiki::common
include ::mediawiki::jobrunner
-}
+ monitoring::service { 'jobrunner_http_hhvm':
+ description => 'HHVM jobrunner',
+ check_command => 'check_http_jobrunner',
+ retries => 2,
+ }
+
+}
--
To view, visit https://gerrit.wikimedia.org/r/276710
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6b99bba4eb3459f90bd27c0456308ebeedca3410
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Giuseppe Lavagetto <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits