coren has submitted this change and it was merged. Change subject: Labs: Add a timeout check to getent (via ldap) on labstore ......................................................................
Labs: Add a timeout check to getent (via ldap) on labstore Reading the group database via LDAP is the functionality NFS relies on from the rest of the labs infrastructure. This tests that it works and returns in reasonable (subsecond) time and raises an alert otherwise. Change-Id: Ia0d41eb39ff43cd94b5261875445d7832694f647 --- A modules/labstore/files/getent_check M modules/labstore/manifests/monitoring.pp 2 files changed, 50 insertions(+), 0 deletions(-) Approvals: Andrew Bogott: Looks good to me, but someone else must approve coren: Looks good to me, approved jenkins-bot: Verified diff --git a/modules/labstore/files/getent_check b/modules/labstore/files/getent_check new file mode 100644 index 0000000..1d1e21e --- /dev/null +++ b/modules/labstore/files/getent_check @@ -0,0 +1,34 @@ +#! /bin/bash + +# Plant a sigalarm and catch it +# for a hard timeout + +timeout() { + echo "CRITICAL: getent group tools.admin timed out (>1s)" + exit 2 +} + + +trap timeout 14 +sleep 1 && kill -14 $$ & + +# Try to fetch a known group via LDAP hee + +/usr/bin/useldap /usr/bin/getent group tools.admin >/dev/null 2>&1 +rv=$? + +# At this point, the command returns (worked or failed, so +# remove the trap and wait for the timeout to pass. + +trap '' 14 +wait + +# Not timed out, but could still have failed. + +if [ $rv -ne 0 ]; then + echo "CRITICAL: getent group tools.admin failed" + exit 2 +fi + +echo "OK: getent group returns within a second" +exit 0 diff --git a/modules/labstore/manifests/monitoring.pp b/modules/labstore/manifests/monitoring.pp index fe34df9..0eb231a 100644 --- a/modules/labstore/manifests/monitoring.pp +++ b/modules/labstore/manifests/monitoring.pp @@ -47,4 +47,20 @@ critical => '24', percentage => '50', # Don't freak out on spikes } + + # Monitor that getent passwd over LDAP resolves in reasonable time + # (this being the mechanism that NFS uses to fetch groups) + nrpe::monitor_service { 'getent_check': + nrpe_command => '/usr/local/bin/getent_check', + description => 'Getent speed check', + require => File['/usr/local/bin/getent_check'], + } + + file { '/usr/local/bin/getent_check': + ensure => present, + source => 'puppet:///modules/labstore/getent_check', + mode => '0755', + owner => 'root', + group => 'root', + } } -- To view, visit https://gerrit.wikimedia.org/r/258168 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia0d41eb39ff43cd94b5261875445d7832694f647 Gerrit-PatchSet: 3 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: coren <mpellet...@wikimedia.org> Gerrit-Reviewer: Andrew Bogott <abog...@wikimedia.org> Gerrit-Reviewer: Chasemp <r...@wikimedia.org> Gerrit-Reviewer: Faidon Liambotis <fai...@wikimedia.org> Gerrit-Reviewer: coren <mpellet...@wikimedia.org> Gerrit-Reviewer: jenkins-bot <> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits