Faidon Liambotis has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/87548


Change subject: base: fix check-raid to handle no or multiple LDs
......................................................................

base: fix check-raid to handle no or multiple LDs

The latest fix to check-raid was halfway there, by fixing the case where
no configured LDs exist. Unfortunately, there are also cases where no
output at all is printed, we should handle this gracefully as well.

Finally, the current check is very broken in the sense that only reports
errors if the *last* LD is non-optimal. We have boxes with as much as 14
LDs, so we should check all of them. Plus, the "physical device(s)"
comment was completely wrong, as it was counting logical drives.

Change-Id: I3a6709140c9516c3dedce4a2f55374749ffc2b14
---
M modules/base/files/monitoring/check-raid.py
1 file changed, 28 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/48/87548/1

diff --git a/modules/base/files/monitoring/check-raid.py 
b/modules/base/files/monitoring/check-raid.py
old mode 100755
new mode 100644
index 4d816c2..4c58a27
--- a/modules/base/files/monitoring/check-raid.py
+++ b/modules/base/files/monitoring/check-raid.py
@@ -235,23 +235,34 @@
     stateRegex = re.compile('^State\s*:\s*([^\n]*)')
     drivesRegex = re.compile('^Number Of Drives( per span)?\s*:\s*([^\n]*)')
     configuredRegex = re.compile('^Adapter \d+: No Virtual Drive Configured')
-    state = None
-    numDrives = None
-    configured = True
+    numPD = numLD = failedLD = 0
+    states = []
+    lines = 0
+    match = False
+
     for line in proc.stdout:
+        if len(line.strip()) and not line.startswith('Exit Code'):
+            lines += 1
+
         m = stateRegex.match(line)
         if m is not None:
+            match = True
+            numLD += 1
             state = m.group(1)
+            if state != 'Optimal':
+                failedLD += 1
+                states.append(state)
             continue
 
         m = drivesRegex.match(line)
         if m is not None:
-            numDrives = int(m.group(2))
+            match = True
+            numPD += int(m.group(2))
             continue
 
-        c = configuredRegex.match(line)
-        if c is not None:
-            configured = False
+        m = configuredRegex.match(line)
+        if m is not None:
+            match = True
             continue
 
     ret = proc.wait()
@@ -259,19 +270,19 @@
         print 'WARNING: MegaCli64 returned exit status %d' % (ret)
         return 1
 
-    if numDrives is None:
-        if configured:
-            print 'WARNING: Parse error processing MegaCli64 output'
-            return 1
-        else:
-            print 'OK: No disks configured for RAID'
-            return 0
+    if not match and lines > 0:
+        print 'WARNING: Parse error processing MegaCli64 output'
+        return 1
 
-    if state != 'Optimal':
-        print 'CRITICAL: %s' % (state)
+    if numLD == 0:
+        print 'OK: No disks configured for RAID'
+        return 0
+
+    if failedLD > 0:
+        print 'CRITICAL: %d failed logical drive(s) (%s)' % (failedLD, ", 
".join(states))
         return 2
 
-    print 'OK: State is %s, checked %d logical device(s)' % (state, numDrives)
+    print 'OK: State is Optimal, checked %d logical drive(s), %d physical 
drive(s)' % (numLD, numPD)
     return 0
 
 

-- 
To view, visit https://gerrit.wikimedia.org/r/87548
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3a6709140c9516c3dedce4a2f55374749ffc2b14
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Faidon Liambotis <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to