Ema has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/337808 )

Change subject: varnish: icinga check for expiry mailbox lag
......................................................................


varnish: icinga check for expiry mailbox lag

We have found a correlation between the 503 errors described in T145661
and the varnish expiry thread not being able to catch up with its
mailbox.

Add an icinga check alerting when the lag grows beyond certain
thresholds.

Bug: T145661
Change-Id: I5e76b594d8c57fa9a679088c794b04c7879be715
---
A modules/varnish/files/check_varnish_expiry_mailbox_lag.sh
M modules/varnish/manifests/common.pp
2 files changed, 46 insertions(+), 0 deletions(-)

Approvals:
  Ema: Verified; Looks good to me, approved
  BBlack: Looks good to me, but someone else must approve



diff --git a/modules/varnish/files/check_varnish_expiry_mailbox_lag.sh 
b/modules/varnish/files/check_varnish_expiry_mailbox_lag.sh
new file mode 100755
index 0000000..32c0f22
--- /dev/null
+++ b/modules/varnish/files/check_varnish_expiry_mailbox_lag.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+cmd="/usr/bin/varnishstat -t off -1"
+
+if ! $cmd > /dev/null 2>&1;
+then
+    echo "UNKNOWN: cannot run varnishstat"
+    exit 3
+fi
+
+$cmd | awk '
+/exp_mailed/ { m = $2 }
+/exp_received/ { r = $2 }
+
+END {
+    msg = "expiry mailbox lag is "
+    lag = m - r
+
+    if (lag > 10000) {
+        print "CRITICAL: " msg lag
+        exit 2
+    }
+    else if (lag > 1000) {
+        print "WARNING: " msg lag
+        exit 1
+    } else {
+        print "OK: " msg lag
+        exit 0
+    }
+}'
diff --git a/modules/varnish/manifests/common.pp 
b/modules/varnish/manifests/common.pp
index 866dd25..4cecae3 100644
--- a/modules/varnish/manifests/common.pp
+++ b/modules/varnish/manifests/common.pp
@@ -109,4 +109,20 @@
         group  => 'root',
         mode   => '0444',
     }
+
+    # We have found a correlation between the 503 errors described in T145661
+    # and the expiry thread not being able to catch up with its mailbox
+    file { '/usr/local/lib/nagios/plugins/check_varnish_expiry_mailbox_lag':
+        ensure => present,
+        source => 
'puppet:///modules/role/varnish/check_varnish_expiry_mailbox_lag.sh',
+        mode   => '0555',
+        owner  => 'root',
+        group  => 'root',
+    }
+
+    nrpe::monitor_service { 'check_varnish_expiry_mailbox_lag':
+        description  => 'Check Varnish expiry mailbox lag',
+        nrpe_command => 
'/usr/local/lib/nagios/plugins/check_varnish_expiry_mailbox_lag',
+        require      => 
File['/usr/local/lib/nagios/plugins/check_varnish_expiry_mailbox_lag'],
+    }
 }

-- 
To view, visit https://gerrit.wikimedia.org/r/337808
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5e76b594d8c57fa9a679088c794b04c7879be715
Gerrit-PatchSet: 6
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Ema <[email protected]>
Gerrit-Reviewer: BBlack <[email protected]>
Gerrit-Reviewer: Elukey <[email protected]>
Gerrit-Reviewer: Ema <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to