Thanks Ernst, I didn't realize that smartctl could peek behind the raid controllers, obviously I didn't read the man page! I'd noticed the smartd messages to the effect "not capable of SMART Health Status check" but I didn't dig deeper, live and learn. Interestingly in my current case with a failed drive, the smartctl -H and -a are both showing the overall-health self-assessment as PASSED for all of the drives--maybe it would have alerted prior to the drive failing.
At this point I've got OMSA installed on both systems and have begun to work on getting an NMS installed. nagios with nagios-selinux from EPEL are not working reliably with selinux, so I've begun to look into Icinga2 which looks to be able to work with NAGIOS plugins & hopefully check_openmanage. Even if I do get OMSA and an NMS working, this endeavor has encouraged me to learn about megacli and smartctl & more learning is always a good thing! --Larry On Tue, Sep 5, 2017 at 3:51 AM, Ernst Pijper <[email protected]> wrote: > One possibility is to use smartctl to check the health of the individual > drives: > > smartctl -H -d megaraid,<N> /dev/sda > > where <N> is the device id returned by megacli: MegaCli64 -Pdlist -aAll | > grep "Device Id”. You can use the same device name /dev/sda > for all device ids. The command only considers the device id. > > Output will be something like: > > # smartctl -H -d megaraid,5 /dev/sda > smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.28.3.el7.x86_64] > (local build) > Copyright (C) 2002-13, Bruce Allen, Christian Franke, > www.smartmontools.org > > /dev/sda [megaraid_disk_05] [SAT]: Device open changed type from > 'megaraid,5' to 'sat+megaraid,5' > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > Warning: This result is based on an Attribute check. > > Replace -H with -a for a more detailed analysis of the disk status. > > Don’t expect a 100% guarantee that smartctl will correctly predict > failures. > > Ernst > > On 4 sep. 2017, at 14:07, Larry Fahnoe <[email protected]> wrote: > > Thanks for the thoughts and feedback! So far, it looks as though the > conventional wisdom is to bite the bullet and install OMSA on the servers > and then set up an NMS like Nagios. I've had troubles with OMSA in the past > and I don't currently have an NMS running since my environments are rather > small. Perhaps OMSA has improved, I will investigate it once again as well > as reconsider Nagios and check_openmanage as the combination certainly > appears to do all that I would want and more. > > Are there any other opinions about lightweight, maintained health check > monitoring utilities, particularly monitoring drives owned by a PERC 6/i? > > --Larry > > -- > Larry Fahnoe, Fahnoe Technology Consulting, [email protected] > <[email protected]> > Minneapolis, Minnesota www.FahnoeTech.com > <http://www.fahnoetech.com/> > _______________________________________________ > Linux-PowerEdge mailing list > [email protected] > https://lists.us.dell.com/mailman/listinfo/linux-poweredge > > > -- Larry Fahnoe, Fahnoe Technology Consulting, [email protected] Minneapolis, Minnesota www.FahnoeTech.com
_______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge
