coren has submitted this change and it was merged.

Change subject: Labs: Add a timeout check to getent (via ldap) on labstore
......................................................................


Labs: Add a timeout check to getent (via ldap) on labstore

Reading the group database via LDAP is the functionality
NFS relies on from the rest of the labs infrastructure.  This
tests that it works and returns in reasonable (subsecond)
time and raises an alert otherwise.

Change-Id: Ia0d41eb39ff43cd94b5261875445d7832694f647
---
A modules/labstore/files/getent_check
M modules/labstore/manifests/monitoring.pp
2 files changed, 50 insertions(+), 0 deletions(-)

Approvals:
  Andrew Bogott: Looks good to me, but someone else must approve
  coren: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/modules/labstore/files/getent_check 
b/modules/labstore/files/getent_check
new file mode 100644
index 0000000..1d1e21e
--- /dev/null
+++ b/modules/labstore/files/getent_check
@@ -0,0 +1,34 @@
+#! /bin/bash
+
+# Plant a sigalarm and catch it
+# for a hard timeout
+
+timeout() {
+    echo "CRITICAL: getent group tools.admin timed out (>1s)"
+    exit 2
+}
+
+
+trap timeout 14
+sleep 1 && kill -14 $$ &
+
+# Try to fetch a known group via LDAP hee
+
+/usr/bin/useldap /usr/bin/getent group tools.admin >/dev/null 2>&1
+rv=$?
+
+# At this point, the command returns (worked or failed, so
+# remove the trap and wait for the timeout to pass.
+
+trap '' 14
+wait
+
+# Not timed out, but could still have failed.
+
+if [ $rv -ne 0 ]; then
+    echo "CRITICAL: getent group tools.admin failed"
+    exit 2
+fi
+
+echo "OK: getent group returns within a second"
+exit 0
diff --git a/modules/labstore/manifests/monitoring.pp 
b/modules/labstore/manifests/monitoring.pp
index fe34df9..0eb231a 100644
--- a/modules/labstore/manifests/monitoring.pp
+++ b/modules/labstore/manifests/monitoring.pp
@@ -47,4 +47,20 @@
         critical    => '24',
         percentage  => '50', # Don't freak out on spikes
     }
+
+    # Monitor that getent passwd over LDAP resolves in reasonable time
+    # (this being the mechanism that NFS uses to fetch groups)
+    nrpe::monitor_service { 'getent_check':
+        nrpe_command => '/usr/local/bin/getent_check',
+        description  => 'Getent speed check',
+        require      => File['/usr/local/bin/getent_check'],
+    }
+
+    file { '/usr/local/bin/getent_check':
+        ensure => present,
+        source => 'puppet:///modules/labstore/getent_check',
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+    }
 }

-- 
To view, visit https://gerrit.wikimedia.org/r/258168
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia0d41eb39ff43cd94b5261875445d7832694f647
Gerrit-PatchSet: 3
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: coren <mpellet...@wikimedia.org>
Gerrit-Reviewer: Andrew Bogott <abog...@wikimedia.org>
Gerrit-Reviewer: Chasemp <r...@wikimedia.org>
Gerrit-Reviewer: Faidon Liambotis <fai...@wikimedia.org>
Gerrit-Reviewer: coren <mpellet...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to