Faidon Liambotis has submitted this change and it was merged.
Change subject: base: fix check-raid to handle no or multiple LDs
......................................................................
base: fix check-raid to handle no or multiple LDs
The latest fix to check-raid was halfway there, by fixing the case where
no configured LDs exist. Unfortunately, there are also cases where no
output at all is printed, we should handle this gracefully as well.
Finally, the current check is very broken in the sense that only reports
errors if the *last* LD is non-optimal. We have boxes with as much as 14
LDs, so we should check all of them. Plus, the "physical device(s)"
comment was completely wrong, as it was counting logical drives.
Change-Id: I3a6709140c9516c3dedce4a2f55374749ffc2b14
---
M modules/base/files/monitoring/check-raid.py
1 file changed, 28 insertions(+), 17 deletions(-)
Approvals:
Faidon Liambotis: Looks good to me, approved
jenkins-bot: Verified
diff --git a/modules/base/files/monitoring/check-raid.py
b/modules/base/files/monitoring/check-raid.py
old mode 100755
new mode 100644
index 4d816c2..4c58a27
--- a/modules/base/files/monitoring/check-raid.py
+++ b/modules/base/files/monitoring/check-raid.py
@@ -235,23 +235,34 @@
stateRegex = re.compile('^State\s*:\s*([^\n]*)')
drivesRegex = re.compile('^Number Of Drives( per span)?\s*:\s*([^\n]*)')
configuredRegex = re.compile('^Adapter \d+: No Virtual Drive Configured')
- state = None
- numDrives = None
- configured = True
+ numPD = numLD = failedLD = 0
+ states = []
+ lines = 0
+ match = False
+
for line in proc.stdout:
+ if len(line.strip()) and not line.startswith('Exit Code'):
+ lines += 1
+
m = stateRegex.match(line)
if m is not None:
+ match = True
+ numLD += 1
state = m.group(1)
+ if state != 'Optimal':
+ failedLD += 1
+ states.append(state)
continue
m = drivesRegex.match(line)
if m is not None:
- numDrives = int(m.group(2))
+ match = True
+ numPD += int(m.group(2))
continue
- c = configuredRegex.match(line)
- if c is not None:
- configured = False
+ m = configuredRegex.match(line)
+ if m is not None:
+ match = True
continue
ret = proc.wait()
@@ -259,19 +270,19 @@
print 'WARNING: MegaCli64 returned exit status %d' % (ret)
return 1
- if numDrives is None:
- if configured:
- print 'WARNING: Parse error processing MegaCli64 output'
- return 1
- else:
- print 'OK: No disks configured for RAID'
- return 0
+ if not match and lines > 0:
+ print 'WARNING: Parse error processing MegaCli64 output'
+ return 1
- if state != 'Optimal':
- print 'CRITICAL: %s' % (state)
+ if numLD == 0:
+ print 'OK: No disks configured for RAID'
+ return 0
+
+ if failedLD > 0:
+ print 'CRITICAL: %d failed logical drive(s) (%s)' % (failedLD, ",
".join(states))
return 2
- print 'OK: State is %s, checked %d logical device(s)' % (state, numDrives)
+ print 'OK: State is Optimal, checked %d logical drive(s), %d physical
drive(s)' % (numLD, numPD)
return 0
--
To view, visit https://gerrit.wikimedia.org/r/87548
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I3a6709140c9516c3dedce4a2f55374749ffc2b14
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Faidon Liambotis <[email protected]>
Gerrit-Reviewer: Faidon Liambotis <[email protected]>
Gerrit-Reviewer: jenkins-bot
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits