Hey Kevin, Just to make sure, you're using the newest FreeIPMI 0.6.7? There's been a number of corner case fixes in last 4 or 5 minor releases.
On Sat, 2008-09-13 at 16:25 -0500, Kevin Day wrote: > Hey, FreeIPMI guys! I'm trying to get sensor monitoring going on a > wide range of hardware running FreeBSD. It's mostly just worked, with > a couple of exceptions. > > The first problem is with some older Dell 2650 servers. This is how > they appear in dmidecode: > > System Information > Manufacturer: Dell Computer Corporation > Product Name: PowerEdge 2650 > > IPMI Device Information > Interface Type: SMIC (Server Management Interface Chip) > Specification Version: 1.0 > I2C Slave Address: 0x10 > NV Storage Device: Not Present > Base Address: 0x000000000000ECF4 (I/O) > Register Spacing: Successive Byte Boundaries > > > Some commands seem to kinda work like ipmi-sel, ipmi-fru (the > exception being the date): > > # ipmi-fru -v > FRU Inventory Device ID: 0x00 > > FRU Board Info Area Manufacturing Date/Time: 05/22/10 - 17:36:00 > FRU Board Manufacturer: Dell Inc. > FRU Board Product Name: Dell Remote Access Controller > FRU Board Part Number: A03 > > FRU Product Manufacturer Name: Dell Inc. > FRU Product Product Name: Dell Remote Access Controller > FRU Product Part/Model Number: RAC V1.0 > FRU Product Version Type: 3.12 > > FRU Management Access Record Length Incorrect: 20 This is odd. The record length simply does not match what the record is supposed to have. It's possible it's a corner case in my parsing or an issue with their motherboard (I've already seen a few motherboards with FRU data that is non-compliant and I have to work around it). Could you send me the output of ipmi-fru w/ --debug. > > but sensors don't seem to, giving > "ipmi_cmd_get_sensor_reading_discrete: bad completion code: request > data/parameter invalid" at everything I do. I can provide full logs or > any debugging if it's okay to post anything that long here. "ipmitool > sensor list" seems to work perfectly, oddly enough. Where do I start > to try to figure out what's wrong? Looks like the motherboard is reporting an error that I am not working around/handling properly in ipmi-sensors/ipmimonitoring. I think the reason ipmitool works is because it ignores errors on sensors and outputs all remaining sensors it can. Could you send me the --debug output of ipmi-sensors? > > > The other issue I'm having is on a much newer HP Proliant DL185 G5. It > appears as: > > System Information > Manufacturer: HP > Product Name: ProLiant DL185 G5 > > IPMI Device Information > Interface Type: KCS (Keyboard Control Style) > Specification Version: 2.0 > I2C Slave Address: 0x10 > NV Storage Device: Not Present > Base Address: 0x0000000000000CA2 (I/O) > Register Spacing: Successive Byte Boundaries > > ipmi-sensors itself seems to work okay at the beginning, but errors > out at the end: > > 64: POST Error (System Firmware): [Unknown] > 112: Memory ECC (Memory): [Unknown] > 160: ACPI State (ACPI Power State): [S0/G0 "working"] > 208: System Reset (Module/Board): [OK] > 256: SYSTEM FAN 1 (Fan): 6435.01 RPM (0.00/1000.40): [OK] > 320: SYSTEM FAN 2 (Fan): 6265.66 RPM (0.00/1000.40): [OK] > 384: SYSTEM FAN 3 (Fan): 6265.66 RPM (0.00/1000.40): [OK] > 448: SYSTEM FAN 4 (Fan): 6265.66 RPM (0.00/1000.40): [OK] > 512: Rear HDD Opt Fan (Fan): 1904.76 RPM (0.00/1000.40): [OK] > 576: System 12V (Voltage): 12.26 V (NA/NA): [OK] > 640: System 5V (Voltage): 5.17 V (NA/NA): [OK] > 704: System AUX 5V (Voltage): 5.19 V (NA/NA): [OK] > 768: System 3.3V (Voltage): 3.37 V (NA/NA): [OK] > 832: System AUX 3.3V (Voltage): 3.34 V (NA/NA): [OK] > 896: CPU0 Vcore (Voltage): 1.39 V (NA/NA): [OK] > 960: CPU1 Vcore (Voltage): 1.33 V (NA/NA): [OK] > 1024: CPU0 Mem Vcore (Voltage): 1.81 V (NA/NA): [OK] > 1088: CPU1 Mem Vcore (Voltage): 1.80 V (NA/NA): [OK] > 1152: CPU0 MEM VTT (Voltage): 0.94 V (NA/NA): [OK] > 1216: CPU1 MEM VTT (Voltage): 0.92 V (NA/NA): [OK] > 1280: NB SB Vcore (Voltage): 1.23 V (NA/NA): [OK] > 1344: CPU0 Diode (Temperature): 33.00 C (NA/85.00): [OK] > 1408: CPU1 Diode (Temperature): 36.50 C (NA/85.00): [OK] > 1472: Power Ambient (Temperature): 4.00 C (NA/45.00): [OK] > 1536: Rear Ambient (Temperature): 7.00 C (NA/45.00): [OK] > 1600: SB HTX Ambient (Temperature): 0.00 C (NA/45.00): [OK] > 1664: NB Ambient (Temperature): 0.00 C (NA/45.00): [OK] > 1728: Front Panel Temp (Temperature): 13.50 C (NA/45.00): [OK] > 1792: Therm-Trip0 (Processor): [State Deasserted] > 1840: CPU0 Prochot (Temperature): [Limit Not Exceeded] > 1888: CPU1 Prochot (Temperature): [Limit Not Exceeded] > 1936: CPU Socket 0 (Processor): [Device Inserted/Device Present] > 1984: CPU Socket 1 (Processor): [Device Inserted/Device Present] > 2032: PS1 Present (Power Supply): [Device Inserted/Device Present] > 2080: PS2 Present (Power Supply): [Device Removed/Device Absent] > 2128: PS1 Status (Power Supply): [Performance Met] > 2176: PS2 Status (Power Supply): [Performance Met] > 2224: Red PS Present (Power Unit): [Device Inserted/Device Present] > 2272: PS Redundancy (FRU Sensor): [Redundancy Lost] > 2416: Identify (Button): [State Deasserted] > ipmi_cmd_get_sensor_reading_discrete: bad completion code: request > data/parameter invalid Again, looks like the motherboard is reporting an error that I am not working around/handling properly. Could you send me the --debug output? > > but, ipmimonitoring gets fixated on record 832: > > Record_ID | Sensor Name | Sensor Group | Monitoring Status| Sensor > Units | Sensor Reading > 256 | SYSTEM FAN 1 | Fan | Nominal | RPM | 6435.006435 > 320 | SYSTEM FAN 2 | Fan | Nominal | RPM | 6265.664160 > 384 | SYSTEM FAN 3 | Fan | Nominal | RPM | 6105.006105 > 448 | SYSTEM FAN 4 | Fan | Nominal | RPM | 6435.006435 > 512 | Rear HDD Opt Fan | Fan | Nominal | RPM | 1904.761905 > 576 | System 12V | Voltage | Nominal | V | 12.264000 > 640 | System 5V | Voltage | Nominal | V | 5.171400 > 704 | System AUX 5V | Voltage | Nominal | V | 5.194800 > 768 | System 3.3V | Voltage | Nominal | V | 3.372600 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 > 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800 Is this looping forever or does it complete? This ones a little more fishy. Looks like I am accidently storing incorrect data. Could you send me --debug output? > Are either of these known problems? If not, what can I do to help? Not known problems to me. The high odds are the motherboard is reporting some strange error that I need to handle/work around. For example, ipmi-sensors/ipmimonitoring will report a sensor reading as "Unknown" on a "cannot read sensor" or "bmc busy" and similar error codes. We just need to find out what error code those motherboards are reporting and handle it properly. Thanks, Al > -- Kevin > > > > > > _______________________________________________ > Freeipmi-users mailing list > [email protected] > http:// lists.gnu.org/mailman/listinfo/freeipmi-users > -- Albert Chu [EMAIL PROTECTED] 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-users mailing list [email protected] http://lists.gnu.org/mailman/listinfo/freeipmi-users
