coren has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/258168

Change subject: Labs: Add a timeout check to getent (via ldap) on labstore
......................................................................

Labs: Add a timeout check to getent (via ldap) on labstore

Reading the group database via LDAP is the functionality
NFS relies on from the rest of the labs infrastructure.  This
tests that it works and returns in reasonable (subsecond)
time and raises an alert otherwise.

Change-Id: Ia0d41eb39ff43cd94b5261875445d7832694f647
---
A modules/labstore/files/getent_check
M modules/labstore/manifests/monitoring.pp
2 files changed, 41 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/68/258168/1

diff --git a/modules/labstore/files/getent_check 
b/modules/labstore/files/getent_check
new file mode 100644
index 0000000..a40c66e
--- /dev/null
+++ b/modules/labstore/files/getent_check
@@ -0,0 +1,25 @@
+#! /bin/bash
+
+# Plant a sigalarm and catch it
+# for a hard timeout
+#
+sleep 1 && kill -14 $$ &
+trap timeout 14
+
+timeout() {
+    echo "CRITICAL: getent group tools.admin timed out (>1s)"
+    exit 2
+}
+
+/usr/bin/useldap /usr/bin/getent group tools.admin >/dev/null 2>&1
+rv=$?
+
+trap '' 14
+wait
+
+if [ $rv -ne 0 ]; then
+    echo "CRITICAL: getent group tools.admin failed"
+    exit 2
+fi
+echo "OK: getent group returns within a second"
+exit 0
diff --git a/modules/labstore/manifests/monitoring.pp 
b/modules/labstore/manifests/monitoring.pp
index fe34df9..0eb231a 100644
--- a/modules/labstore/manifests/monitoring.pp
+++ b/modules/labstore/manifests/monitoring.pp
@@ -47,4 +47,20 @@
         critical    => '24',
         percentage  => '50', # Don't freak out on spikes
     }
+
+    # Monitor that getent passwd over LDAP resolves in reasonable time
+    # (this being the mechanism that NFS uses to fetch groups)
+    nrpe::monitor_service { 'getent_check':
+        nrpe_command => '/usr/local/bin/getent_check',
+        description  => 'Getent speed check',
+        require      => File['/usr/local/bin/getent_check'],
+    }
+
+    file { '/usr/local/bin/getent_check':
+        ensure => present,
+        source => 'puppet:///modules/labstore/getent_check',
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+    }
 }

-- 
To view, visit https://gerrit.wikimedia.org/r/258168
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia0d41eb39ff43cd94b5261875445d7832694f647
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: coren <mpellet...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to