Faidon Liambotis has uploaded a new change for review.
https://gerrit.wikimedia.org/r/87548
Change subject: base: fix check-raid to handle no or multiple LDs
......................................................................
base: fix check-raid to handle no or multiple LDs
The latest fix to check-raid was halfway there, by fixing the case where
no configured LDs exist. Unfortunately, there are also cases where no
output at all is printed, we should handle this gracefully as well.
Finally, the current check is very broken in the sense that only reports
errors if the *last* LD is non-optimal. We have boxes with as much as 14
LDs, so we should check all of them. Plus, the "physical device(s)"
comment was completely wrong, as it was counting logical drives.
Change-Id: I3a6709140c9516c3dedce4a2f55374749ffc2b14
---
M modules/base/files/monitoring/check-raid.py
1 file changed, 28 insertions(+), 17 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/operations/puppet
refs/changes/48/87548/1
diff --git a/modules/base/files/monitoring/check-raid.py
b/modules/base/files/monitoring/check-raid.py
old mode 100755
new mode 100644
index 4d816c2..4c58a27
--- a/modules/base/files/monitoring/check-raid.py
+++ b/modules/base/files/monitoring/check-raid.py
@@ -235,23 +235,34 @@
stateRegex = re.compile('^State\s*:\s*([^\n]*)')
drivesRegex = re.compile('^Number Of Drives( per span)?\s*:\s*([^\n]*)')
configuredRegex = re.compile('^Adapter \d+: No Virtual Drive Configured')
- state = None
- numDrives = None
- configured = True
+ numPD = numLD = failedLD = 0
+ states = []
+ lines = 0
+ match = False
+
for line in proc.stdout:
+ if len(line.strip()) and not line.startswith('Exit Code'):
+ lines += 1
+
m = stateRegex.match(line)
if m is not None:
+ match = True
+ numLD += 1
state = m.group(1)
+ if state != 'Optimal':
+ failedLD += 1
+ states.append(state)
continue
m = drivesRegex.match(line)
if m is not None:
- numDrives = int(m.group(2))
+ match = True
+ numPD += int(m.group(2))
continue
- c = configuredRegex.match(line)
- if c is not None:
- configured = False
+ m = configuredRegex.match(line)
+ if m is not None:
+ match = True
continue
ret = proc.wait()
@@ -259,19 +270,19 @@
print 'WARNING: MegaCli64 returned exit status %d' % (ret)
return 1
- if numDrives is None:
- if configured:
- print 'WARNING: Parse error processing MegaCli64 output'
- return 1
- else:
- print 'OK: No disks configured for RAID'
- return 0
+ if not match and lines > 0:
+ print 'WARNING: Parse error processing MegaCli64 output'
+ return 1
- if state != 'Optimal':
- print 'CRITICAL: %s' % (state)
+ if numLD == 0:
+ print 'OK: No disks configured for RAID'
+ return 0
+
+ if failedLD > 0:
+ print 'CRITICAL: %d failed logical drive(s) (%s)' % (failedLD, ",
".join(states))
return 2
- print 'OK: State is %s, checked %d logical device(s)' % (state, numDrives)
+ print 'OK: State is Optimal, checked %d logical drive(s), %d physical
drive(s)' % (numLD, numPD)
return 0
--
To view, visit https://gerrit.wikimedia.org/r/87548
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3a6709140c9516c3dedce4a2f55374749ffc2b14
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Faidon Liambotis <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits