Thank you all for your input!
@Andy: Once we know that a system has experienced some sort of failure, it's
easy enough to go figure out what that issue may be. Currently, we don't
have any way of knowing if there are memory/cpu conditions. It's easy enough
to use SDR to detect fan/psu/temperature/voltage issues, but for anything
outside of the SDR, it's a bit more fuzzy. If we knew all of the potential
issues which might be detected and logged in the SEL, it would be easy to
script something that parses that output and alerts accordingly. It's not an
ideal solution to white or blacklist against that output however, so I was
hoping for a single query that could tell us whether or not the BMC thinks
that the system is healthy.
@Randy: I was actually just in the process of evaluating that option. At
least on our Dells I know it's possible through Platform Events.
Thanks again to all,
- Tim
From: "Schafer, Randy A" <randy.a.scha...@intel.com>
Date: Thu, 12 Apr 2012 19:10:46 +0000
To: Andy Cress <andy.cr...@us.kontron.com>, Ryan Cox <ryan_...@byu.edu>,
Timothy Gelter <timo...@gelter.com>
Cc: "ipmitool-devel@lists.sourceforge.net"
<ipmitool-devel@lists.sourceforge.net>
Subject: RE: [Ipmitool-devel] generic test for system component failure
I don’t know how much Dell and HP support Alarms, but you might be able to
have the servers generate SNMP or email alarms on health issues.
Randy Schafer
EPSD Firmware Engineering
Intel Corporation
randy.a.scha...@intel.com
(503) 712-3893
From: Andy Cress [mailto:andy.cr...@us.kontron.com]
Sent: Thursday, April 12, 2012 11:53 AM
To: Ryan Cox; Timothy Gelter
Cc: ipmitool-devel@lists.sourceforge.net
Subject: Re: [Ipmitool-devel] generic test for system component failure
Tim,
The Chassis Identify LED has a standard IPMI command to set it, but
OEM-specific commands to read it.
The system health/fault LED (if present) is entirely custom and
OEM-specific.
Even if there were a common way to read that health LED, what would that
really accomplish? You would still have to also read the corresponding SEL
event to know what happened.
Andy
---
On 04/12/2012 10:47 AM, Timothy Gelter wrote:
Hello Andy,
I appreciate your quick response!
The systems I'm currently targeting are all Dell (C1100, C6220, and R620)
but I'd also want to be able to monitor a variety of HP servers (DL360,
DL185, BL460, & more) in this same way.
I was hoping not to have to parse the SEL but that's what I'll do if that's
my only option.
Thanks,
- Tim
-----
Tim,
The ‘system health light’ will be different for each chassis vendor. Which
chassis vendor is this?
In any case, parsing the IPMI SEL (waiting for IPMI events) is the surest
way to detect faults on an IPMI-capable system. That’s the trigger for the
firmware to turn on the system health light.
Andy
WARNING - This e-mail or its attachments may contain controlled technical
data or controlled technology within the definition of the International
Traffic in Arms Regulations (ITAR) or Export Administration Regulations
(EAR), and are subject to the export control laws of the U.S. Government.
Transfer of this data or technology by any means to a foreign person,
whether in the United States or abroad, without an export license or other
approval from the U.S. Government, is prohibited. The information contained
in this document is CONFIDENTIAL and property of Kontron. Any unauthorized
review, use, disclosure or distribution is prohibited without express
written consent of Kontron. If you are not the intended recipient, please
contact the sender and destroy all copies of the original message and
enclosed attachments.
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel