Hi Werner, > Currently I'm using ipmitool for the "Nagios IPMI Sensor Monitoring > Plugin" www.*thomas-krenn.com/ipmi-plugin > But it seems to me that ipmimonitoring from freeipmi has the benefit > that it reports things like failed power supplies in a better way to > parse it for a Nagios/Icinga plugin - I can just parse the fourth column > and report OK back to Nagios/Icinga only when I get an "Nominal" > there... > So maybe I should write another version of the Plugin which uses > ipmimonitoring instead of ipmitool...
FYI, I wrote this plugin for Nagios awhile back. http://www.gnu.org/software/freeipmi/nagios_ipmimonitoring.pl Perhaps a decent place to start? (there'll be a new one when FreeIPMI 0.9.1 comes out, although I *think* it'll be forward compatible.) Al On Tue, 2010-06-22 at 14:28 -0700, Werner Fischer wrote: > Hi Al - thank you for your valuable feedback again, > > On Tue, 2010-06-22 at 10:44 -0700, Albert Chu wrote: > > Hi Werner, > > > > Thanks. You are using a slightly older version of FreeIPMI (I can tell > > from the output format), so some of the comments below are related to > > newer versions. > you are right, I use 0.7.15-2 which ships with Ubuntu 10.04 > > > > > On Tue, 2010-06-22 at 04:16 -0700, Werner Fischer wrote: > > > Hi Al, > > > > > > ipmimonitoring seems to be very useful for my needs. I gave it a try > > > with an Intel SR2500 server. I unplugged one power chord from Power > > > Supply 1 (PS1) and removed the cover of the cassis: > > > > > > ipmimonitoring reports "Critical" in the fourth column, which is great: > > > wfisc...@wfischer-t410-ubuntu:~$ ipmimonitoring -h 192.168.1.211 > > > -u monitor -p relation -l user | grep "| Critical |" > > > 33 | Power Redundancy | Power Unit | Critical | N/A | 'Redundancy > > > Lost' 'Non-redundant:Sufficient Resources from Redundant' > > > 36 | Physical Scrty | Physical Security | Critical | N/A | > > > 'General Chassis Intrusion' > > > 49 | PS1 Status | Power Supply | Critical | N/A | 'Presence > > > detected' 'Power Supply input lost (AC/DC)' > > > wfisc...@wfischer-t410-ubuntu:~$ > > > > > > With ipmitool I got an "ok" for these sensors: > > > wfisc...@wfischer-t410-ubuntu:~$ ipmitool -I lan -H 192.168.1.211 > > > -U monitor -P relation -L user sdr elist > > > [...] > > > PS1 AC Current | 78h | ok | 10.1 | 0.12 Amps > > > PS2 AC Current | 79h | ok | 10.2 | 0.93 Amps > > > PS1 +12V Current | 7Ah | ok | 10.1 | 0 Amps > > > PS2 +12V Current | 7Bh | ok | 10.2 | 16 Amps > > > PS1 +12V Power | 7Ch | ok | 10.1 | 0 Watts > > > PS2 +12V Power | 7Dh | ok | 10.2 | 192 Watts > > > P1 Therm Margin | 99h | ok | 3.1 | -49 degrees C > > > P2 Therm Margin | 9Bh | ok | 3.2 | -54 degrees C > > > P1 Therm Ctrl % | C0h | ok | 3.1 | 0 unspecified > > > P2 Therm Ctrl % | C1h | ok | 3.2 | 0 unspecified > > > Proc 1 Vccp | D0h | ok | 3.1 | 1.23 Volts > > > Proc 2 Vccp | D1h | ok | 3.2 | 1.23 Volts > > > Mem Therm Margin | 48h | ns | 3.2 | No Reading > > > Pwr Unit Stat | 01h | ok | 21.1 | > > > Power Redundancy | 02h | ok | 21.1 | Redundancy Lost, > > > Non-Redundant: Sufficient from Redundant > > > BMC Watchdog | 03h | ok | 7.1 | > > > Platform Secu V | 04h | ok | 7.1 | > > > Physical Scrty | 05h | ok | 23.1 | General Chassis intrusion > > > [...] > > > > > > Another test with ipmimonitoring, when PS1 is completely removed: > > > wfisc...@wfischer-t410-ubuntu:~$ ipmimonitoring -h 192.168.1.211 > > > -u monitor -p relation -l user | grep "| Critical |" > > > 32 | Pwr Unit Stat | Power Unit | Nominal | N/A | 'OK' > > > 33 | Power Redundancy | Power Unit | Critical | N/A | 'Redundancy > > > Lost' 'Non-redundant:Sufficient Resources from Redundant' > > > [...] > > > 49 | PS1 Status | Power Supply | Nominal | N/A | 'OK' > > > 50 | PS2 Status | Power Supply | Nominal | N/A | 'Presence > > > detected' > > > > > > (Here ipmimonitoring says 'OK' in the last column, VMware says > > > "Unknown" when a power supply is not installed - see > > > http://**www.**wefi.net/shared/sr2500-example-1.png) > > > > It does depend on how the sensor is implemented. Here's a layman's idea > > of what a power supply sensor can report: > > > > A) sensor reading not available > > B) sensor reading available, reports nothing > > C) sensor reading available, reports presence detected > > D) sensor reading available, reports something wrong (e.g. AC lost) > > > > A, C, & D map to obvious outputs (N/A vs "presence detected" vs "AC > > input lost"). B is the one that's hard to deal with. On some > > motherboards, "reports nothing" means the same as "presence > > detected" (the sensor reports A, B, or D, but not C). On some other > > motherboards "reports nothing" is the same as "N/A" (the sensor reports > > B, C, or D, but not A). I currently map "reports nothing" to "OK", > > which is the same output as many other sensors. > Thanks for this info. > > > Not knowing much about the sensor software you're using, I would bet > > that VMware knows the behavior of their own hardware and has programmed > > something unique for it. > The hardware I used for testing is an Intel SR2500 - so it's not > VMware's own hardware. I think VMware does it in a way very similar to > yours described in /etc/ipmi_monitoring_sensors.conf below. > > > > My question: how do you distinguish in ipmimonitoring which of the > > > assertion states are ok ("Nominal") and which are not ("Critical")? > > > > You should find a config file /etc/ipmi_monitoring_sensors.conf which > > lists the defaults. You can then tweak as appropriate for your system. > Oh, that's great. That was exactly the thing which I was looking for! > Really great! > > > Side note, whenever I release FreeIPMI 0.9.1, the tool ipmimonitoring > > will disappear and become a symlink to 'ipmi-sensors > > --output-sensor-state' and /etc/ipmi_monitoring_sensors.conf will > > become /etc/freeipmi_interpret_sensor.conf. > Thanks for this info. > > Currently I'm using ipmitool for the "Nagios IPMI Sensor Monitoring > Plugin" www.*thomas-krenn.com/ipmi-plugin > But it seems to me that ipmimonitoring from freeipmi has the benefit > that it reports things like failed power supplies in a better way to > parse it for a Nagios/Icinga plugin - I can just parse the fourth column > and report OK back to Nagios/Icinga only when I get an "Nominal" > there... > So maybe I should write another version of the Plugin which uses > ipmimonitoring instead of ipmitool... > > Thanks and best regards, > Werner > > > > > Al > > > > > Thanks a lot for your great help, > > > best regards, > > > Werner > > > > > > PS: here is the full output of impimonitoring from my first test: > > > wfisc...@wfischer-t410-ubuntu:~$ ipmimonitoring -h 192.168.1.211 -u > > > monitor -p relation -l user > > > Record_ID | Sensor Name | Sensor Group | Monitoring Status| Sensor Units > > > | Sensor Reading > > > 1 | BB +1.2V Vtt | Voltage | Nominal | V | 1.197000 > > > 2 | BB +1.5V AUX | Voltage | Nominal | V | 1.466400 > > > 3 | BB +1.5V | Voltage | Nominal | V | 1.482000 > > > 4 | BB +1.8V | Voltage | Nominal | V | 1.785000 > > > 5 | BB +3.3V | Voltage | Nominal | V | 3.354000 > > > 6 | BB +3.3V STB | Voltage | Nominal | V | 3.354000 > > > 7 | BB +1.5V ESB | Voltage | Nominal | V | 1.505400 > > > 8 | BB +5V | Voltage | Nominal | V | 5.070000 > > > 9 | BB +12V AUX | Voltage | Nominal | V | 11.904000 > > > 10 | BB +0.9V | Voltage | Nominal | V | 0.897600 > > > 11 | Serverboard Temp | Temperature | Nominal | C | 29.000000 > > > 12 | Ctrl Panel Temp | Temperature | Nominal | C | 25.000000 > > > 13 | Fan 1 | Fan | Nominal | RPM | 5891.000000 > > > 14 | Fan 2 | Fan | Nominal | RPM | 6278.000000 > > > 15 | Fan 3 | Fan | Nominal | RPM | 5805.000000 > > > 16 | Fan 4 | Fan | Nominal | RPM | 6321.000000 > > > 17 | Fan 5 | Fan | Nominal | RPM | 9052.000000 > > > 18 | Fan 6 | Fan | Nominal | RPM | 8060.000000 > > > 19 | PS1 AC Current | Current | Nominal | A | 0.124000 > > > 20 | PS2 AC Current | Current | Nominal | A | 0.992000 > > > 21 | PS1 +12V Current | Current | Nominal | A | 0.000000 > > > 22 | PS2 +12V Current | Current | Nominal | A | 15.000000 > > > 23 | PS1 +12V Power | N/A | Nominal | W | 0.000000 > > > 24 | PS2 +12V Power | N/A | Nominal | W | 192.000000 > > > 25 | P1 Therm Margin | Temperature | Nominal | C | -49.000000 > > > 26 | P2 Therm Margin | Temperature | Nominal | C | -53.000000 > > > 27 | P1 Therm Ctrl % | Temperature | Nominal | N/A | 0.000000 > > > 28 | P2 Therm Ctrl % | Temperature | Nominal | N/A | 0.000000 > > > 29 | Proc 1 Vccp | Voltage | Nominal | V | 1.227600 > > > 30 | Proc 2 Vccp | Voltage | Nominal | V | 1.233800 > > > 32 | Pwr Unit Stat | Power Unit | Nominal | N/A | 'OK' > > > 33 | Power Redundancy | Power Unit | Critical | N/A | 'Redundancy Lost' > > > 'Non-redundant:Sufficient Resources from Redundant' > > > 34 | BMC Watchdog | Watchdog 2 | Nominal | N/A | 'OK' > > > 35 | Platform Secu V | Platform Security Violation Attempt | Nominal | > > > N/A | 'OK' > > > 36 | Physical Scrty | Physical Security | Critical | N/A | 'General > > > Chassis Intrusion' > > > 37 | FP Interrupt | Critical Interrupt | Nominal | N/A | 'OK' > > > 38 | Event Log Disabl | Event Logging Disabled | Nominal | N/A | 'OK' > > > 40 | System Event | System Event | Nominal | N/A | 'OK' > > > 41 | BB Vbat | Battery | Nominal | N/A | 'OK' > > > 42 | Fan 1 Present | Fan | Nominal | N/A | 'Device Inserted/Device > > > Present' > > > 43 | Fan 2 Present | Fan | Nominal | N/A | 'Device Inserted/Device > > > Present' > > > 44 | Fan 3 Present | Fan | Nominal | N/A | 'Device Inserted/Device > > > Present' > > > 45 | Fan 4 Present | Fan | Nominal | N/A | 'Device Inserted/Device > > > Present' > > > 46 | Fan 5 Present | Fan | Nominal | N/A | 'Device Inserted/Device > > > Present' > > > 47 | Fan 6 Present | Fan | Nominal | N/A | 'Device Inserted/Device > > > Present' > > > 48 | Fan Redundancy | Fan | Nominal | N/A | 'Fully Redundant' > > > 49 | PS1 Status | Power Supply | Critical | N/A | 'Presence detected' > > > 'Power Supply input lost (AC/DC)' > > > 50 | PS2 Status | Power Supply | Nominal | N/A | 'Presence detected' > > > 51 | ACPI State | System ACPI Power State | Nominal | N/A | 'S0/G0' > > > 52 | Button | Button/Switch | Nominal | N/A | 'OK' > > > 56 | Processor 1 Stat | Processor | Nominal | N/A | 'Processor Presence > > > detected' > > > 57 | Processor 2 Stat | Processor | Nominal | N/A | 'Processor Presence > > > detected' > > > 58 | PCIe Link0 | Critical Interrupt | Nominal | N/A | 'OK' > > > 59 | PCIe Link1 | Critical Interrupt | Nominal | N/A | 'OK' > > > 60 | PCIe Link2 | Critical Interrupt | Nominal | N/A | 'OK' > > > 61 | PCIe Link3 | Critical Interrupt | Nominal | N/A | 'OK' > > > 62 | PCIe Link4 | Critical Interrupt | Nominal | N/A | 'OK' > > > 63 | PCIe Link5 | Critical Interrupt | Nominal | N/A | 'OK' > > > 64 | PCIe Link6 | Critical Interrupt | Nominal | N/A | 'OK' > > > 65 | PCIe Link7 | Critical Interrupt | Nominal | N/A | 'OK' > > > 66 | PCIe Link8 | Critical Interrupt | Nominal | N/A | 'OK' > > > 67 | PCIe Link9 | Critical Interrupt | Nominal | N/A | 'OK' > > > 68 | PCIe Link10 | Critical Interrupt | Nominal | N/A | 'OK' > > > 69 | PCIe Link11 | Critical Interrupt | Nominal | N/A | 'OK' > > > 70 | PCIe Link12 | Critical Interrupt | Nominal | N/A | 'OK' > > > 71 | PCIe Link13 | Critical Interrupt | Nominal | N/A | 'OK' > > > 76 | CPU Popul Error | Processor | Nominal | N/A | 'OK' > > > 77 | DIMM 1A | Slot/Connector | Nominal | N/A | 'Slot/Connector Device > > > installed/attached' > > > 79 | DIMM 1B | Slot/Connector | Nominal | N/A | 'Slot/Connector Device > > > installed/attached' > > > 81 | DIMM 1C | Slot/Connector | Nominal | N/A | 'Slot/Connector Device > > > installed/attached' > > > 83 | DIMM 1D | Slot/Connector | Nominal | N/A | 'Slot/Connector Device > > > installed/attached' > > > wfisc...@wfischer-t410-ubuntu:~$ > > > > > > > > > On Mon, 2010-06-21 at 09:32 -0700, Al Chu wrote: > > > > Hi Werner, > > > > > > > > > Does anybody know whether one of the other tools like freeipmi or > > > > > impiutil has some functionality like this? > > > > > > > > In FreeIPMI, there is a tool called ipmimonitoring that I believe does > > > > what you're asking for (output condensed for readability below). > > > > > > > > 18 | Fan1 | Nominal | 14500.00 | RPM | 'OK' > > > > 19 | Fan2 | Nominal | 14300.00 | RPM | 'OK' > > > > 20 | Fan3/CPU2 | Nominal | 14300.00 | RPM | 'OK' > > > > 21 | Fan4/CPU1 | Nominal | 13900.00 | RPM | 'OK' > > > > 22 | Fan5 | Nominal | 14000.00 | RPM | 'OK' > > > > 23 | Fan6 | Nominal | 14000.00 | RPM | 'OK' > > > > 24 | Fan7/CPU3 | Critical | 0.00 | RPM | 'At or Below > > > > (<=) Lower Non-Recoverable Threshold' > > > > 25 | Fan8/CPU4 | Critical | 0.00 | RPM | 'At or Below > > > > (<=) Lower Non-Recoverable Threshold' > > > > 26 | Fan9 | Critical | 0.00 | RPM | 'At or Below > > > > (<=) Lower Non-Recoverable Threshold' > > > > 27 | Power Supply 1 | Nominal | N/A | N/A | 'Presence > > > > detected' > > > > 28 | Power Supply 2 | N/A | N/A | N/A | N/A > > > > > > > > So for this example, fans with normal RPM are "Nominal", out of range is > > > > "Critical", and the power supply that doesn't exist is "N/A". There is > > > > also a "Warning" output when the situation is appropriate. > > > > > > > > I can speak more of it, but it's probably not best on this mailing. > > > > Feel free to ping me on the FreeIPMI mailing list. > > > > > > > > Al > > > > > > > > On Mon, 2010-06-21 at 06:08 -0700, Werner Fischer wrote: > > > > > Hi ipmitool developers, > > > > > > > > > > I thought about the problem regarding monitoring discrete IPMI > > > > > sensors, > > > > > that Brian reported back in April: > > > > > http://***www.***mail-archive.com/[email protected]/msg01472.html > > > > > > > > > > I did some in-depth testing and looked how the current VMware ESXi 4.0 > > > > > reports different states of discrete IPMI sensors. > > > > > > > > > > I tested two example scenarios with an Intel SR2500 server: > > > > > > > > > > Test case 1: > > > > > * Power Supply 2 removed > > > > > * Chassis cover removed > > > > > * VMware reports: > > > > > http://***www.***wefi.net/shared/sr2500-example-1.png > > > > > > > > > > Test case 2: > > > > > * Power Supply 2 present, but power cable removed > > > > > * Vmware reports: > > > > > http://***www.***wefi.net/shared/sr2500-example-2.png > > > > > > > > > > (Below you find some example ipmitool outputs for these two cases). > > > > > > > > > > The current IPMI specification lists possible sensor-specific-offsets > > > > > for each sensor type in table 42-3, Sensor Type Codes. > > > > > > > > > > To me it seems that VMware uses some mapping, which defines which > > > > > offsets (assertions/deassertions) cause a warning or an alarm, > > > > > e.g. an offset for the event "General Chassis Intrusion" for a > > > > > Physical > > > > > Security sensor (sensor type code 05h) leads to status "Warning". > > > > > > > > > > So my request: > > > > > * introduce some new option for ipmitool (something like > > > > > "ipmitool > > > > > get-server-status") where ipmitool uses such kind of mapping, > > > > > too. We could define which offsets/assertions should cause a > > > > > warning. In this way an end-user would have an easy way to > > > > > quickly find out whether or not everything is ok with his > > > > > hardware... > > > > > > > > > > Currently using e.g. "ipmitool sdr elist all" returns "ok" for sensor > > > > > states like "General Chassis Intrusion" (see below) > > > > > > > > > > What do you think? > > > > > Any other ideas how we could accomplish that? > > > > > Does anybody know whether one of the other tools like freeipmi or > > > > > impiutil has some functionality like this? > > > > > > > > > > best regards, > > > > > Werner > > > > > > > > > > PS: Here are the outputs of ipmitool for this: > > > > > > > > > > Test case 1: > > > > > wfisc...@wfischer-t410-ubuntu:~$ ipmitool -I lan -H > > > > > 192.168.1.211 -U monitor -L user sdr elist all | grep -i "PS" > > > > > Password: > > > > > PS1 AC Current | 78h | ok | 10.1 | 0.93 Amps > > > > > PS2 AC Current | 79h | ns | 10.2 | No Reading > > > > > PS1 +12V Current | 7Ah | ok | 10.1 | 16 Amps > > > > > PS2 +12V Current | 7Bh | ns | 10.2 | No Reading > > > > > PS1 +12V Power | 7Ch | ok | 10.1 | 192 Watts > > > > > PS2 +12V Power | 7Dh | ns | 10.2 | No Reading > > > > > PS1 Status | 70h | ok | 10.1 | Presence detected > > > > > PS2 Status | 71h | ok | 10.2 | > > > > > wfisc...@wfischer-t410-ubuntu:~$ ipmitool -I lan -H > > > > > 192.168.1.211 -U monitor -L user sdr elist all | grep -i "Physical > > > > > Scrty" > > > > > Password: > > > > > Physical Scrty | 05h | ok | 23.1 | General Chassis > > > > > intrusion > > > > > wfisc...@wfischer-t410-ubuntu:~$ ipmitool -I lan -H > > > > > 192.168.1.211 -U admin raw 0x04 0x2d 0x70 > > > > > Password: > > > > > Data length = 1 > > > > > 00 c0 01 00 > > > > > wfisc...@wfischer-t410-ubuntu:~$ ipmitool -I lan -H > > > > > 192.168.1.211 -U admin raw 0x04 0x2d 0x71 > > > > > Password: > > > > > Data length = 1 > > > > > 00 c0 00 00 > > > > > wfisc...@wfischer-t410-ubuntu:~$ ipmitool -I lan -H > > > > > 192.168.1.211 -U admin -P relation sdr get "Physical Scrty" > > > > > Sensor ID : Physical Scrty (0x5) > > > > > Entity ID : 23.1 (System Chassis) > > > > > Sensor Type (Discrete): Physical Security > > > > > States Asserted : Physical Security > > > > > [General Chassis intrusion] > > > > > Assertion Events : Physical Security > > > > > [General Chassis intrusion] > > > > > Assertions Enabled : Physical Security > > > > > [General Chassis intrusion] > > > > > [System unplugged from LAN] > > > > > Deassertions Enabled : Physical Security > > > > > [General Chassis intrusion] > > > > > [System unplugged from LAN] > > > > > > > > > > Test case 2: > > > > > wfisc...@wfischer-t410-ubuntu:~$ ipmitool -I lan -H > > > > > 192.168.1.211 -U monitor -L user sdr get "PS2 Status" > > > > > Password: > > > > > Sensor ID : PS2 Status (0x71) > > > > > Entity ID : 10.2 (Power Supply) > > > > > Sensor Type (Discrete): Power Supply > > > > > States Asserted : Power Supply > > > > > [Presence detected] > > > > > [Power Supply AC lost] > > > > > Assertion Events : Power Supply > > > > > [Presence detected] > > > > > [Power Supply AC lost] > > > > > Assertions Enabled : Power Supply > > > > > [Presence detected] > > > > > [Failure detected] > > > > > [Predictive failure] > > > > > [Power Supply AC lost] > > > > > [Config Error: Vendor Mismatch] > > > > > [Config Error: Revision Mismatch] > > > > > [Config Error: Processor Missing] > > > > > [Config Error] > > > > > Deassertions Enabled : Power Supply > > > > > [Presence detected] > > > > > [Failure detected] > > > > > [Predictive failure] > > > > > [Power Supply AC lost] > > > > > [Config Error: Vendor Mismatch] > > > > > [Config Error: Revision Mismatch] > > > > > [Config Error: Processor Missing] > > > > > [Config Error] > > > > > > > > > > wfisc...@wfischer-t410-ubuntu:~$ ipmitool -I lan -H > > > > > 192.168.1.211 -U monitor -L user sdr elist all | grep -i "PS" > > > > > Password: > > > > > PS1 AC Current | 78h | ok | 10.1 | 0.93 Amps > > > > > PS2 AC Current | 79h | ok | 10.2 | 0.12 Amps > > > > > PS1 +12V Current | 7Ah | ok | 10.1 | 16 Amps > > > > > PS2 +12V Current | 7Bh | ok | 10.2 | 0 Amps > > > > > PS1 +12V Power | 7Ch | ok | 10.1 | 192 Watts > > > > > PS2 +12V Power | 7Dh | ok | 10.2 | 0 Watts > > > > > PS1 Status | 70h | ok | 10.1 | Presence detected > > > > > PS2 Status | 71h | ok | 10.2 | Presence detected, > > > > > Power Supply AC lost > > > > > wfisc...@wfischer-t410-ubuntu:~$ ipmitool -I lan -H > > > > > 192.168.1.211 -U admin raw 0x04 0x2d 0x71 > > > > > Password: > > > > > Data length = 1 > > > > > 00 c0 09 00 > > > > > wfisc...@wfischer-t410-ubuntu:~$ > > > > > > > > > > > > > > -- > > > > Albert Chu > > > > [email protected] > > > > Computer Scientist > > > > High Performance Systems Division > > > > Lawrence Livermore National Laboratory > > > > > > > > > > > > > > > > _______________________________________________ > > > Freeipmi-users mailing list > > > [email protected] > > > http://**lists.gnu.org/mailman/listinfo/freeipmi-users > > > > > > -- > : Werner Fischer > : Technology Specialist > : Thomas-Krenn.AG | Speed is (y)our success > : http://*www.*thomas-krenn.com | http://*www.*thomas-krenn.com/wiki > > > _______________________________________________ > Freeipmi-users mailing list > [email protected] > http://*lists.gnu.org/mailman/listinfo/freeipmi-users > -- Albert Chu [email protected] Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-users mailing list [email protected] http://lists.gnu.org/mailman/listinfo/freeipmi-users
