Dzahn has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/361023 )

Change subject: icinga/role:mail::mx: add monitoring of exim queue size
......................................................................


icinga/role:mail::mx: add monitoring of exim queue size

Adds a new Icinga plugin check_exim_queue using bash and
exipick.

Adds NRPE monitoring service in MX role class assuming it
is prefered that i put it there vs the module.

Using 1000 for WARN and 3000 for CRIT. I observed values around
300 on mx1001 the other day when testing it.

Bug: T133110
Change-Id: I70bdef87eed6902ad27c92f2fa0e19b3d2274d7d
---
A modules/icinga/files/check_exim_queue.sh
M modules/role/manifests/mail/mx.pp
2 files changed, 92 insertions(+), 0 deletions(-)

Approvals:
  Dzahn: Verified; Looks good to me, approved



diff --git a/modules/icinga/files/check_exim_queue.sh 
b/modules/icinga/files/check_exim_queue.sh
new file mode 100755
index 0000000..6446fc5
--- /dev/null
+++ b/modules/icinga/files/check_exim_queue.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+# Nagios/Icinga plugin to check for oversized exim4 queues.
+#
+# Daniel Zahn - Wikimedia Foundation Inc.
+#
+# https://phabricator.wikimedia.org/T133110
+#
+# ./check_exim_queue -w <warn> -c <crit>
+#
+# <warn> = number of mails in queue that trigger a WARN (int)
+# <crit> = number of mails in queue that trigger a CRIT (int)
+#
+# dependencies: exipick, sudo
+
+set -eu
+
+usage() { echo "Usage: $0 -w <warn> -c <crit>" 1>&2; exit 1; }
+
+declare -i WARN_LIMIT=0
+declare -i CRIT_LIMIT=0
+
+# count only messages older than MIN_AGE
+MIN_AGE="10m"
+
+while getopts "w:c:" o; do
+    case "${o}" in
+    w)
+       WARN_LIMIT=${OPTARG}
+       ;;
+    c)
+       CRIT_LIMIT=${OPTARG}
+       ;;
+    *)
+       usage
+       ;;
+    esac
+done
+
+if [ $WARN_LIMIT == 0 ] || [ $CRIT_LIMIT == 0 ]; then
+    usage
+fi
+
+declare -i QSIZE=0
+
+SUDO="/usr/bin/sudo"
+EXIPICK="/usr/sbin/exipick"
+
+# number of messages in queue older than $MIN_AGE
+QSIZE="$(${SUDO} ${EXIPICK} -bpc -o ${MIN_AGE})"
+
+# echo "QSIZE: ${QSIZE} WARN: ${WARN_LIMIT} CRIT: ${CRIT_LIMIT}"
+
+if [ "$QSIZE" -ge "$CRIT_LIMIT" ] ; then
+    echo "CRITICAL: ${QSIZE} mails in exim queue."
+    exit 2
+fi
+
+if [ "$QSIZE" -ge "$WARN_LIMIT" ] ; then
+    echo "WARNING: ${QSIZE} mails in exim queue."
+    exit 1
+fi
+
+if [ "$QSIZE" -lt "$WARN_LIMIT" ] && [ "$QSIZE" -lt "$CRIT_LIMIT" ] ; then
+    echo "OK: Less than ${WARN_LIMIT} mails in exim queue."
+    exit 0
+fi
+
+echo "UNKNOWN: something went wrong. check plugin ($0)."
+exit 3
+
diff --git a/modules/role/manifests/mail/mx.pp 
b/modules/role/manifests/mail/mx.pp
index 905b309..d860227 100644
--- a/modules/role/manifests/mail/mx.pp
+++ b/modules/role/manifests/mail/mx.pp
@@ -106,4 +106,26 @@
         ensure => 'present',
         source => 'puppet:///modules/role/exim/logrotate/exim4-base.mx',
     }
+
+    # monitor mail queue size (T133110)
+    file { '/usr/local/lib/nagios/plugins/check_exim_queue':
+        ensure => present,
+        owner  => 'root',
+        group  => 'root',
+        mode   => '0555',
+        source => 'puppet:///modules/icinga/check_exim_queue.sh',
+    }
+
+    ::sudo::user { 'nagios_exim_queue':
+        user       => 'nagios',
+        privileges => ['ALL = NOPASSWD: /usr/sbin/exipick -bpc -o 
[[\:digit\:]][[\:digit\:]][mh]'],
+    }
+
+    nrpe::monitor_service { 'check_exim_queue':
+        description    => 'exim queue',
+        nrpe_command   => '/usr/local/lib/nagios/plugins/check_exim_queue -w 
1000 -c 3000',
+        check_interval => 30,
+        retry_interval => 10,
+        timeout        => 20,
+    }
 }

-- 
To view, visit https://gerrit.wikimedia.org/r/361023
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I70bdef87eed6902ad27c92f2fa0e19b3d2274d7d
Gerrit-PatchSet: 12
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Dzahn <dz...@wikimedia.org>
Gerrit-Reviewer: Alexandros Kosiaris <akosia...@wikimedia.org>
Gerrit-Reviewer: Dzahn <dz...@wikimedia.org>
Gerrit-Reviewer: Filippo Giunchedi <fgiunch...@wikimedia.org>
Gerrit-Reviewer: Herron <kher...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to