Dzahn has uploaded a new change for review. https://gerrit.wikimedia.org/r/188480
Change subject: let icinga auto restart gitblit when it goes down ...................................................................... let icinga auto restart gitblit when it goes down Using the Icinga/Nagios feature of eventhandlers, let Icinga automatically restart the gitblit service when it goes down. consists of: - eventhandler script itself that checks the status and acts based upon it - additional parameter to a monitoring::service to set an eventhandler script (depends on Ic2a57a15c1cd0b4) - nagios command definition refs: http://docs.icinga.org/latest/en/eventhandlers.html https://docs.puppetlabs.com/references/latest/type.html#nagioscommand Change-Id: I0774b11db0dbf63ca42585b4e11b3de66dc317ba --- M manifests/role/gitblit.pp A modules/icinga/files/restart-gitblit.sh M modules/icinga/manifests/init.pp 3 files changed, 77 insertions(+), 0 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/operations/puppet refs/changes/80/188480/1 diff --git a/manifests/role/gitblit.pp b/manifests/role/gitblit.pp index d5c2711..3b20591 100644 --- a/manifests/role/gitblit.pp +++ b/manifests/role/gitblit.pp @@ -20,9 +20,15 @@ rule => 'proto tcp dport 8080 { saddr $INTERNAL ACCEPT; }' } + @@nagios_command { + command_name => 'restart-gitblit', + command_line => '/usr/lib/icinga/eventhandlers/restart-gitblit $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$' + } + monitoring::service { 'gitblit_web': description => 'git.wikimedia.org', check_command => 'check_http_url!git.wikimedia.org!/tree/mediawiki%2Fcore.git', + event_handler => 'restart_gitblit', } nrpe::monitor_service { 'gitblit_process': diff --git a/modules/icinga/files/restart-gitblit.sh b/modules/icinga/files/restart-gitblit.sh new file mode 100644 index 0000000..a6667eb --- /dev/null +++ b/modules/icinga/files/restart-gitblit.sh @@ -0,0 +1,56 @@ +#!/bin/bash +# Event handler script for restarting gitblit via icinga +# restart gitblit on antimony (http://git.wikimedia.org) +# after the second CRIT in a SOFT state or the first one in a HARD state + +case "$1" in +OK) + # The service just came back up, so don't do anything... + ;; +WARNING) + # We don't really care about warning states, since the service is probably still running... + ;; +UNKNOWN) + # We don't know what might be causing an unknown error, so don't do anything... + ;; +CRITICAL) + # Aha! The service appears to have a problem - perhaps we should restart the server... + + # Is this a "soft" or a "hard" state? + case "$2" in + + # We're in a "soft" state, meaning that Icinga is in the middle of retrying the + # check before it turns into a "hard" state and contacts get notified... + SOFT) + + # What check attempt are we on? We don't want to restart gitblit on the first + # check, because it may just be a fluke! + case "$2" in + + # Wait until the check has been tried 3 times before restarting the web server. + # If the check fails on the 4th time (after we restart the web server), the state + # type will turn to "hard" and contacts will be notified of the problem. + # Hopefully this will restart the web server successfully, so the 4th check will + # result in a "soft" recovery. If that happens no one gets notified because we + # fixed the problem! + 3) + echo -n "Restarting gitblit service (2nd soft critical state)..." + ssh [email protected] -C 'service gitblit restart' + ;; + esac + ;; + + # The gitblit service somehow managed to turn into a hard error without getting fixed. + # It should have been restarted by the code above, but for some reason it didn't. + # Let's give it one last try, shall we? + # Note: Contacts have already been notified of a problem with the service at this + # point (unless you disabled notifications for this service) + HARD) + echo -n "Restarting gitblit service..." + ssh [email protected] -C 'service gitblit restart' + ;; + esac + ;; +esac +exit 0 + diff --git a/modules/icinga/manifests/init.pp b/modules/icinga/manifests/init.pp index 7beb4b8..0a74618 100644 --- a/modules/icinga/manifests/init.pp +++ b/modules/icinga/manifests/init.pp @@ -155,4 +155,19 @@ group => 'www-data', mode => '0664', } + + file { '/usr/lib/nagios/eventhandlers': + ensure => directory, + owner => 'icinga', + group => 'icinga', + mode => '0664', + } + + file { '/usr/lib/nagios/eventhandlers/restart-gitblit.sh': + ensure => present, + owner => 'icinga', + group => 'icinga', + mode => '0750', + source => 'puppet:///modules/icinga/restart-gitblit.sh', + } } -- To view, visit https://gerrit.wikimedia.org/r/188480 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I0774b11db0dbf63ca42585b4e11b3de66dc317ba Gerrit-PatchSet: 1 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: Dzahn <[email protected]> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
