Gianluca Cecchi <[email protected]> writes: > hello, > a blade M910 with RH EL 5.5 x86_64 and 6.2 agents. > Installed these: > libsmbios-2.2.19-10.1.el5.i386.rpm > libsmbios-2.2.19-10.1.el5.x86_64.rpm > RPM-GPG-KEY-dell > RPM-GPG-KEY-libsmbios > smbios-utils-bin-2.2.19-10.1.el5.x86_64.rpm > srvadmin-cm-6.2.0-677.i386.rpm > srvadmin-deng-6.2.0-1.6.el5.i386.rpm > srvadmin-fsa-6.2.0-1.6.el3.i386.rpm > srvadmin-hapi-6.2.0-1.17.el5.i386.rpm > srvadmin-isvc-6.2.0-1.16.el5.i386.rpm > srvadmin-megalib-6.2.0-1.6.el3.i386.rpm > srvadmin-omacore-6.2.0-1.18.el5.i386.rpm > srvadmin-omcommon-6.2.0-1.19.el5.i386.rpm > srvadmin-omilcore-6.2.0-1.9.el5.noarch.rpm > srvadmin-smcommon-6.2.0-1.29.el5.i386.rpm > srvadmin-storage-6.2.0-1.29.el5.i386.rpm > srvadmin-storage-populator-6.2.0-1.25.el3.i386.rpm > srvadmin-storelib-6.2.0-1.11.el3.i386.rpm > srvadmin-storelib-libpci-6.2.0-1.1.el5.i386.rpm > srvadmin-storelib-sysfs-6.2.0-1.1.el5.i386.rpm > srvadmin-sysfsutils-6.2.0-2.1.el5.i386.rpm > srvadmin-xmlsup-6.2.0-1.17.el5.i386.rpm > > I have this strange output when querying the controller status; state is > degraded, but OK... > > # omreport storage controller > Controller PERC H200 Integrated Modular (Embedded) > > Controllers > ID : 0 > Status : Non-Critical > Name : PERC H200 Integrated Modular > Slot ID : Embedded > State : Degraded > Firmware Version : 07.01.33.00 > Minimum Required Firmware Version : Not Applicable > Driver Version : 01.101.06.00 > Minimum Required Driver Version : 02.00.00.00 > Storport Driver Version : Not Applicable > Minimum Required Storport Driver Version : Not Applicable > Number of Connectors : 1 > Rebuild Rate : 50% > BGI Rate : 50% > Check Consistency Rate : 50% > Reconstruct Rate : Not Applicable > Alarm State : Not Applicable > Cluster Mode : Not Applicable > SCSI Initiator ID : Not Applicable > Cache Memory Size : Not Applicable > Patrol Read Mode : Not Applicable > Patrol Read State : Not Applicable > Patrol Read Rate : Not Applicable > Patrol Read Iterations : Not Applicable > Abort check consistency on error : Not Applicable > Allow Revertible Hot Spare and Replace Member : Not Applicable > Auto replace member on predictive failure : Not Applicable > Load balance : Not Applicable > Security Capable : Not Applicable > Security Key Present : Not Applicable > Redundant Path view : Not Applicable > > using check_openmanage from > http://folk.uio.no/trondham/software/check_openmanage.html > with these options > > /usr/lib64/nagios/plugins/check_openmanage -o 0 --blacklist > ctrl_driver=0/ctrl_stdr=0 -d > > System: PowerEdge M910 > ServiceTag: OMSA version: 6.2.0 > BIOS/date: 1.1.7 05/25/2010 Plugin version: 3.5.10 > ----------------------------------------------------------------------------- > Storage Components > ============================================================================= > STATE | ID | MESSAGE TEXT > ---------+----------+-------------------------------------------------------- > OK | 0 | Controller 0 [PERC H200 Integrated Modular] is Degraded > OK | 0:0:0:0 | Physical Disk 0:0:0 [SAS-HDD 146GB] on ctrl 0 is Online > OK | 0:0:0:1 | Physical Disk 0:0:1 [SAS-HDD 146GB] on ctrl 0 is Online > OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 136.13 GB] is Ready > OK | 0:0 | Connector 0 [SAS Port RAID Mode] on controller 0 is > Ready > OK | 0:0:0 | Enclosure 0:0:0 [Backplane] on controller 0 is Ready > ----------------------------------------------------------------------------- > > .... > > In general the standard check (without -d but with the blacklist option, as > the driver release > in RH EL 5 is a little behind the reccomended..) returns ok. > > So the question is > is it ok or not?
Both... see below. > Entering directly into BIOS for the controller (Ctrl-C) gives Optimal as > state.... In the BIOS, there is no driver. The reason behind the the degraded state in the OS is not present in the BIOS. > Is this a bug related with the text output....? The controller is degraded because the driver is too old. In the debug output, you'll see that the controller is "Degraded", and that this is OK. This is because check_openmanage will rather report the reason behind the degraded state. I can see how this can be confusing, but the plugin does this to avoid spamming the user. There can be different reasons behind the degraded state, and there can be more than one at the same time, for example it can be any or all of: - out of date driver - out of date firmware - out of date storport driver The confusion os also a consequence of using blacklisting with the Nagios plugin. In your case: - the driver is out of date - you have blacklisted this feature in the plugin If the 'ctrl_driver' blacklisting keyword is used, and the _only_ thing that is "wrong" with the controller (i.e. why it is degraded), the plugin will return OK for the controller and the out-of-date driver alert is suppressed. You are also using the 'ctrl_stdr' keyword, so both the driver and storport driver can be out of date without the plugin giving an alert. Storport driver is a Windows only thing. Try using the plugin without blacklisting, and you'll see that it reports the out-of-date driver. As I understand it, OMSA will put the controller in a degraded state if it knows that there is a newer driver/firmware version available. It does not mean that there is something wrong with the controller or even that your current driver contains dangerous bugs. It only means that there is a newer version and available, and this is Dell's way of telling you that you should upgrade. The blacklisting feature is there if upgrading is not an option. Hope this helps :) PS. The argument to the '-o' option can be any integer, but only 1 (default), 2 and 3 have any effect. The '-o 0' option in you example have no effect. Cheers, -- Trond H. Amundsen <[email protected]> Center for Information Technology Services, University of Oslo _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
