Re: [Linux-PowerEdge] Looking for a simple health check for T110-II systems

Larry Fahnoe Tue, 05 Sep 2017 05:22:40 -0700

Thanks Ernst, I didn't realize that smartctl could peek behind the raid
controllers, obviously I didn't read the man page! I'd noticed the smartd
messages to the effect "not capable of SMART Health Status check" but I
didn't dig deeper, live and learn. Interestingly in my current case with a
failed drive, the smartctl -H and -a are both showing the overall-health
self-assessment as PASSED for all of the drives--maybe it would have
alerted prior to the drive failing.


At this point I've got OMSA installed on both systems and have begun to
work on getting an NMS installed. nagios with nagios-selinux from EPEL are
not working reliably with selinux, so I've begun to look into Icinga2 which
looks to be able to work with NAGIOS plugins & hopefully check_openmanage.

Even if I do get OMSA and an NMS working, this endeavor has encouraged me
to learn about megacli and smartctl & more learning is always a good thing!

--Larry

On Tue, Sep 5, 2017 at 3:51 AM, Ernst Pijper <[email protected]>
wrote:

> One possibility is to use smartctl to check the health of the individual
> drives:
>
> smartctl -H -d megaraid,<N> /dev/sda
>
> where <N> is the device id returned by megacli:  MegaCli64 -Pdlist -aAll |
> grep "Device Id”. You can use the same device name /dev/sda
> for all device ids. The command only considers the device id.
>
> Output will be something like:
>
> # smartctl -H -d megaraid,5 /dev/sda
> smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.28.3.el7.x86_64]
> (local build)
> Copyright (C) 2002-13, Bruce Allen, Christian Franke,
> www.smartmontools.org
>
> /dev/sda [megaraid_disk_05] [SAT]: Device open changed type from
> 'megaraid,5' to 'sat+megaraid,5'
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> Warning: This result is based on an Attribute check.
>
> Replace -H with -a for a more detailed analysis of the disk status.
>
> Don’t expect a 100% guarantee that smartctl will correctly predict
> failures.
>
> Ernst
>
> On 4 sep. 2017, at 14:07, Larry Fahnoe <[email protected]> wrote:
>
> Thanks for the thoughts and feedback! So far, it looks as though the
> conventional wisdom is to bite the bullet and install OMSA on the servers
> and then set up an NMS like Nagios. I've had troubles with OMSA in the past
> and I don't currently have an NMS running since my environments are rather
> small. Perhaps OMSA has improved, I will investigate it once again as well
> as reconsider Nagios and check_openmanage as the combination certainly
> appears to do all that I would want and more.
>
> Are there any other opinions about lightweight, maintained health check
> monitoring utilities, particularly monitoring drives owned by a PERC 6/i?
>
> --Larry
>
> --
> Larry Fahnoe, Fahnoe Technology Consulting, [email protected]
> <[email protected]>
>            Minneapolis, Minnesota       www.FahnoeTech.com
> <http://www.fahnoetech.com/>
> _______________________________________________
> Linux-PowerEdge mailing list
> [email protected]
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>
>
>


-- 
Larry Fahnoe, Fahnoe Technology Consulting, [email protected]
           Minneapolis, Minnesota       www.FahnoeTech.com

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge

Re: [Linux-PowerEdge] Looking for a simple health check for T110-II systems

Reply via email to