Rob Johnston wrote: > Hello and happy new year! > > What follows is a proposal that Eric Schrock and I have been working on > to define a mechanism to enumerate power supplies and fans on platforms > that support IPMI. This is a first step in the larger Sensor > Abstraction Layer project. > > Comments and questions are encouraged. > > -------------------------------------------------------------------- > > 1. DESCRIPTION > > The Solaris FMA framework is designed to diagnose failures in system > components. Currently these components are discovered by probing the > hardware visible to Solaris via standard OS paths (I/O, CPU, DIMMs, > etc). However, there exists a set of components that are crucial to the > ongoing health of the system that have no connection visible to Solaris. > The most common components, and the most likely to encounter failures, > are power supplies and fans. > > On low-end hardware, these components are often not observable, and it > is the responsibility of the user to manually detect component failure, > or run custom (Windows) software to observe the system. Higher end > systems (such as the x4000 series shipped by Sun) have a service > processor that manages the physical components and sensors in the > system. Some systems (such as SPARC) have a custom communications > mechanism between the OS and the SP, but the industry standard is IPMI > (Intelligent Platform Management Interface). Solaris already has the > ability to communicate with the SP over the baseboard management > controller (/dev/bmc), and a basic library (libipmi) already exists. > > Integrating support for power supplies and fans within FMA is an > important step in bringing all hardware topology enumeration and > diagnosis under a single infrastructure. Without this ability, users > must manage a separate OS instance (on the SP) with different > configuration, separate management, and separate notification > mechanisms. > > This proposal adds basic enumeration support for power supplies and fans > on platforms supporting IPMI. It does not include the ability to > diagnose psu or fan failures, nor does it provide a way to read > environmental sensors (fan speed, etc) for these components. This > functionality will be provided by a future project. > > > 2. TOPOLOGY CHANGES > > On x86 systems, the root of the hc topology tree is hc:///motherboard=0 > (though bay nodes can exist at the root level as well). It doesn't make > sense to have physical components like fans underneath the motherboard, > nor does it make sense to have them directly at the root level. Future > projects will add sensors that monitor the chassis itself, and the > components are contained within the chassis, so a new root hc node is > created: > > hc:///chassis=0 > > There is only ever a single chassis.
I have some heartburn over this statement. I think it will be confusing in thoses cases where a set of domains may share the same physical chassis (eg. blade servers). Perhaps we need a different name, other than "chassis'? -- richard > Within IPMI, fans and psus can be > grouped together into domains that represent a logical unit (typically a > FRU). While uncommon for power supplies, this is quite common for fan > modules or fan trays that contain multiple fans. Therefore a > multi-level topology will be created of the form: > > hc:///chassis=0/psu=0 > hc:///chassis=0/psu=1 > hc:///chassis=0/powermodule=0 > hc:///chassis=0/powermodule=0/psu=0 > hc:///chassis=0/powermodule=0/psu=1 > > hc:///chassis=0/fan=0 > hc:///chassis=0/fan=1 > hc:///chassis=0/fanmodule=0 > hc:///chassis=0/fanmodule=0/fan=0 > hc:///chassis=0/fanmodule=0/fan=1 > > The IPMI components are technically 'cooling' elements, not fans. For > the systems which currently support Solaris and IPMI, only fans are > supported. In the future, we may be able to detect non-fan cooling > elements by examining the set of associated sensors (such as a > tachometer) and inferring the type of cooling element. > > With IPMI, we know all components, even if a component is not currently > present. To allow management software to detect empty component slots, > the FMRIs will always be enumerated, but the is_present method will > return false if the component is not currently present. > > > 3. DYNAMIC ENUMERATION > > A new common libtopo module, ipmi, will be provided that will do dynamic > enumeration of IPMI components. While currently only supported on x86 > systems, any system supporting IPMI should work, so the module will be > present on all architectures. If future SPARC platforms support IPMI > over /dev/bmc, then everything should "just work". > > IPMI has the unusual property that the world is defined solely by > 'sensor descriptor records' (which may be sensors, FRUs, etc). Instead > of iterating over entities (the IPMI term for components), one instead > iterates over all SDR records and infers an entity's existence based on > the sensor records that refer to it. The logic to handle this will be > kept within libipmi, and the ipmi enumerator will iterate over all > discovered entities for any 'power domain', 'power supply', 'cooling > domain', or 'cooling unit' entities. Using IPMI entity association > records, libipmi will have already organized these into the appropriate > hierarchy. > > The default label for each entity will be based on the entity id and the > entity instance number (which is globally unique). These labels may or > may not correspond to the labels on the chassis, but under a correct > IPMI implementation they will be roughly correct, and there will be a > means to override them on a per-platform basis (see below). For > components with a FRU locator record, it may be possible to assign a > label matching the FRU name, such as 'ft0.fm1.fru', though it's unclear > if this is any better (the naming is entirely up to the SP, and the > '.fru' extension is just a convention currently used by the current SP > firmware). > > Each component that is directly under the chassis will be assigned a FRU > matching its resource. Components within an association will default to > the FRU of their parent, unless they have associated FRU locator > records, in which case they will have a distinct FRU matching their > resource. > > The sensors associated with the entity will be used to determine > presence as described in the IPMI specification. > > 4. STATIC ENUMERATION > > It would be nice if dynamic enumeration were enough to model any system > supporting IPMI. Unfortunately, as is the case with most platform > technologies (such as SMBIOS), complete support for enumeration is > hampered by limitations of the specification as well as the > implementation. With a proper implementation of the IPMI spec, it is > possible to enumerate all the components, though attaching semantic > meaning to them (labels, failure sensors, etc) is only possible in some > cases. > > On top of this, most platforms have an IPMI implementation that leaves > something to be desired. A common problem is the lack of entity > association records, so fans that should be part of a logical module > (even if correctly represented via SDR records) are not associated with > one another. Other problems include presence sensors that reference > incorrect entities, missing or incorrect FRU locator records, etc. > > To compensate for both of these problems, libtopo will support both > dynamic enumeration, static enumeration, and static assignment of senors > and properties to dynamically discovered entities. > > > 5. LIBIPMI DETAILS > > As part of this work, libipmi will be expanded in several different > capacities, mostly related to parsing SDR records and representing > entities. > > The SDR infrastructure will be expanded to support all possible SDR > record types (compact sensors, full sensor, entity association, etc). > The code will also be simplified to separate out the SDR name (when > available) from the record, since constructing this value is non-trivial > and should not be left to the consumer. > > New interfaces for gathering sensor readings based on a compact or full > SDR record will be introduced. This consists mainly of a large number > of #defines, code to transform readings based on the linearization > function, and parsing the sensor units. Some of this infrastracture > will not be fully used until future sensor work is complete, but enough > of it is needed at this point (namely parsing sensor-specific state > masks) to warrant its inclusion as part of this project. > > Based on this new infrastructure, libipmi will be enhanced to have a > native notion of entities, even these do not exist as such in the IPMI > specification. The library will scan the SDR records, detect referenced > entities, group sensors with associated entities, and parse entity > association records to create a hierarchy of entities. This will also > include a function to detect entity presence. > > This isolates the details of IPMI entities (of which there are many) to > within libipmi, simplifying the topo enumerator and allowing other > software to be developed on top of it. One of these pieces of software > will be a private utility under /usr/lib/fm, 'ipmitopo', which will > display all IPMI entities (id, type, presence) and sensors associated > with each entity (reading, state, type, etc). This tool is not designed > to replace the open source 'ipmitool' and exists solely to debug the > IPMI topo implementation by leveraging the same code used by libtopo. > > > 6. LIBTOPO ENHANCEMENTS > > To make the implementation of this project possible, a handful of > extensions to both the libtopo enumerator module API and XML schema are > necessary. > > Currently it is not possible to register module methods on nodes that > are statically enumerated via XML map files. Typically, node methods > are registered onto a node by the enumerator module after the node is > bound to the topology. However, since statically enumerated modules > aren't created by the enumerator module this registration doesn't occur. > > While there will be cases where we will be forced to statically define > psu and fan topologies via XML, these nodes still need to support the > node methods that are implemented by the ipmi enumerator module. In > order to allow these methods to be registered on statically defined > nodes, the topo_modops_t struct will be extended with a new operation > (tmo_meth_reg) as shown below: > > typedef int topo_meth_reg_f(topo_mod_t *, tnode_t *); > > typedef struct topo_modops { > topo_enum_f *tmo_enum; /* enumeration op */ > topo_release_f *tmo_release; /* resource release op */ > topo_meth_reg_f *tmo_meth_reg; /* method registration op */ > } topo_modops_t; > > The tmo_meth_reg operation will be optional. Enumerator modules > which implement this operation will register the appropriate set of > methods on the topo node that is passed in. > > To provide a connection between this new operation and nodes that are > statically defined in XML, the syntax of the <node> element will be > extended to include a new optional "mod" attribute. The value of this > attribute should be set to the name of an enumerator module, whose methods > should be registered on that node. Below is an example usage of this > new attribute: > > <range name='fan' min='0' max='2'> > <node instance='0' mod='ipmi'> > . . . > </node> > </range> > > Additionally, the syntax of the <range> element will also be extended to > allow a new "set" attribute. The intention is to allow for conditional > enumeration of a range of nodes based on the platform type. This is > analagous to the conditional specification of properties which is > currently supported via the <propset> element. Below is an example > usage of this new attribute: > > <range name='fanmodule' min='0' max='4' > set='Sun-Fire-X4500|Sun-Fire-X4540'> > . . . > </range> > > In the example above, the <range> element (and all children elements > within) will only be parsed and evaluated if the machine's platform type > matches one of the platforms specified by the "set" attribute's value. > > All of the above extensions will be backwards compatible with any > existing map files and enumerator modules. > > > 7. FUTURE WORK > > This proposal lays the groundwork for a variety of future work under the > auspices of the FMA Sensor Framework. > > The next step will be to include fan and PSU diagnosis. This requires > representing failure sensors within libtopo using the facility nodes > proposed as part of the sensor framework. These sensors are then read > by a sensor-transport module that has as 1:1 correspondence between > ereports and faults. > > This will serve as a proof of concept for facility nodes and prepare > the way for the larger sensor and alert framework, while providing the > greatest immediate benefit. Future work will include representing > analog sensors in libtopo, developing an environmental monitor, > detecting fan and PSU hotplug, and creating a persistent alert > framework. > > > 8. REFERENCES > > "IPMI v2.0 rev. 1.0 specification markup for IPMI v2.0/v1.5 errata > revision 3" > > http://www.intel.com/design/servers/ipmi/pdf/IPMIv2_0_rev1_0_E3_markup.pdf > > Sensor Abstraction Layer OpenSolaris Project > > http://www.opensolaris.org/os/project/sensors/ > > Libtopo documentation: FMD Programmer's Reference, Chapter 9 > > http://www.opensolaris.org/os/community/fm/FMDPRM.pdf > > _______________________________________________ > fm-discuss mailing list > fm-discuss@opensolaris.org > _______________________________________________ fm-discuss mailing list fm-discuss@opensolaris.org