Dzahn has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/188480

Change subject: let icinga auto restart gitblit when it goes down
......................................................................

let icinga auto restart gitblit when it goes down

Using the Icinga/Nagios feature of eventhandlers, let Icinga
automatically restart the gitblit service when it goes down.

consists of:

- eventhandler script itself that checks the status and acts based upon it
- additional parameter to a monitoring::service to set an eventhandler script
  (depends on Ic2a57a15c1cd0b4)
- nagios command definition

refs: http://docs.icinga.org/latest/en/eventhandlers.html
      https://docs.puppetlabs.com/references/latest/type.html#nagioscommand

Change-Id: I0774b11db0dbf63ca42585b4e11b3de66dc317ba
---
M manifests/role/gitblit.pp
A modules/icinga/files/restart-gitblit.sh
M modules/icinga/manifests/init.pp
3 files changed, 77 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/80/188480/1

diff --git a/manifests/role/gitblit.pp b/manifests/role/gitblit.pp
index d5c2711..3b20591 100644
--- a/manifests/role/gitblit.pp
+++ b/manifests/role/gitblit.pp
@@ -20,9 +20,15 @@
         rule => 'proto tcp dport 8080 { saddr $INTERNAL ACCEPT; }'
     }
 
+    @@nagios_command {
+        command_name => 'restart-gitblit',
+        command_line => '/usr/lib/icinga/eventhandlers/restart-gitblit 
$SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$'
+    }
+
     monitoring::service { 'gitblit_web':
         description   => 'git.wikimedia.org',
         check_command => 
'check_http_url!git.wikimedia.org!/tree/mediawiki%2Fcore.git',
+        event_handler => 'restart_gitblit',
     }
 
     nrpe::monitor_service { 'gitblit_process':
diff --git a/modules/icinga/files/restart-gitblit.sh 
b/modules/icinga/files/restart-gitblit.sh
new file mode 100644
index 0000000..a6667eb
--- /dev/null
+++ b/modules/icinga/files/restart-gitblit.sh
@@ -0,0 +1,56 @@
+#!/bin/bash
+# Event handler script for restarting gitblit via icinga
+# restart gitblit on antimony (http://git.wikimedia.org)
+# after the second CRIT in a SOFT state or the first one in a HARD state
+
+case "$1" in
+OK)
+        # The service just came back up, so don't do anything...
+        ;;
+WARNING)
+        # We don't really care about warning states, since the service is 
probably still running...
+        ;;
+UNKNOWN)
+        # We don't know what might be causing an unknown error, so don't do 
anything...
+        ;;
+CRITICAL)
+        # Aha!  The service appears to have a problem - perhaps we should 
restart the server...
+
+        # Is this a "soft" or a "hard" state?
+        case "$2" in
+
+        # We're in a "soft" state, meaning that Icinga is in the middle of 
retrying the
+        # check before it turns into a "hard" state and contacts get 
notified...
+        SOFT)
+
+                # What check attempt are we on?  We don't want to restart 
gitblit on the first
+                # check, because it may just be a fluke!
+                case "$2" in
+
+                # Wait until the check has been tried 3 times before 
restarting the web server.
+                # If the check fails on the 4th time (after we restart the web 
server), the state
+                # type will turn to "hard" and contacts will be notified of 
the problem.
+                # Hopefully this will restart the web server successfully, so 
the 4th check will
+                # result in a "soft" recovery.  If that happens no one gets 
notified because we
+                # fixed the problem!
+                3)
+                        echo -n "Restarting gitblit service (2nd soft critical 
state)..."
+                        ssh [email protected] -C 'service gitblit 
restart'
+                        ;;
+                        esac
+                ;;
+
+        # The gitblit service somehow managed to turn into a hard error 
without getting fixed.
+        # It should have been restarted by the code above, but for some reason 
it didn't.
+        # Let's give it one last try, shall we?
+        # Note: Contacts have already been notified of a problem with the 
service at this
+        # point (unless you disabled notifications for this service)
+        HARD)
+                echo -n "Restarting gitblit service..."
+                ssh [email protected] -C 'service gitblit restart'
+                ;;
+        esac
+        ;;
+esac
+exit 0
+
diff --git a/modules/icinga/manifests/init.pp b/modules/icinga/manifests/init.pp
index 7beb4b8..0a74618 100644
--- a/modules/icinga/manifests/init.pp
+++ b/modules/icinga/manifests/init.pp
@@ -155,4 +155,19 @@
         group  => 'www-data',
         mode   => '0664',
     }
+
+    file { '/usr/lib/nagios/eventhandlers':
+        ensure => directory,
+        owner  => 'icinga',
+        group  => 'icinga',
+        mode   => '0664',
+    }
+
+    file { '/usr/lib/nagios/eventhandlers/restart-gitblit.sh':
+        ensure => present,
+        owner  => 'icinga',
+        group  => 'icinga',
+        mode   => '0750',
+        source => 'puppet:///modules/icinga/restart-gitblit.sh',
+    }
 }

-- 
To view, visit https://gerrit.wikimedia.org/r/188480
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0774b11db0dbf63ca42585b4e11b3de66dc317ba
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Dzahn <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to