coren has uploaded a new change for review. https://gerrit.wikimedia.org/r/258168
Change subject: Labs: Add a timeout check to getent (via ldap) on labstore ...................................................................... Labs: Add a timeout check to getent (via ldap) on labstore Reading the group database via LDAP is the functionality NFS relies on from the rest of the labs infrastructure. This tests that it works and returns in reasonable (subsecond) time and raises an alert otherwise. Change-Id: Ia0d41eb39ff43cd94b5261875445d7832694f647 --- A modules/labstore/files/getent_check M modules/labstore/manifests/monitoring.pp 2 files changed, 41 insertions(+), 0 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/operations/puppet refs/changes/68/258168/1 diff --git a/modules/labstore/files/getent_check b/modules/labstore/files/getent_check new file mode 100644 index 0000000..a40c66e --- /dev/null +++ b/modules/labstore/files/getent_check @@ -0,0 +1,25 @@ +#! /bin/bash + +# Plant a sigalarm and catch it +# for a hard timeout +# +sleep 1 && kill -14 $$ & +trap timeout 14 + +timeout() { + echo "CRITICAL: getent group tools.admin timed out (>1s)" + exit 2 +} + +/usr/bin/useldap /usr/bin/getent group tools.admin >/dev/null 2>&1 +rv=$? + +trap '' 14 +wait + +if [ $rv -ne 0 ]; then + echo "CRITICAL: getent group tools.admin failed" + exit 2 +fi +echo "OK: getent group returns within a second" +exit 0 diff --git a/modules/labstore/manifests/monitoring.pp b/modules/labstore/manifests/monitoring.pp index fe34df9..0eb231a 100644 --- a/modules/labstore/manifests/monitoring.pp +++ b/modules/labstore/manifests/monitoring.pp @@ -47,4 +47,20 @@ critical => '24', percentage => '50', # Don't freak out on spikes } + + # Monitor that getent passwd over LDAP resolves in reasonable time + # (this being the mechanism that NFS uses to fetch groups) + nrpe::monitor_service { 'getent_check': + nrpe_command => '/usr/local/bin/getent_check', + description => 'Getent speed check', + require => File['/usr/local/bin/getent_check'], + } + + file { '/usr/local/bin/getent_check': + ensure => present, + source => 'puppet:///modules/labstore/getent_check', + mode => '0755', + owner => 'root', + group => 'root', + } } -- To view, visit https://gerrit.wikimedia.org/r/258168 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia0d41eb39ff43cd94b5261875445d7832694f647 Gerrit-PatchSet: 1 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: coren <mpellet...@wikimedia.org> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits