As an added note to other developers, I've added a few extra notes about the -v and -vv options in the HEAD ipmi-sensors manpage now too.
Al On Wed, 2007-10-10 at 09:26 -0700, Al Chu wrote: > Hey Gregor, > > There is a sublety here that I added extra documentation for in the > FreeIPMI 0.5.0 manpage (I didn't backport to 0.4.X b/c didn't think it > was that important, but maybe I should have). The ipmi-sensors numbers > listed on the left are "record ids", not sensor numbers. If you use the > verbose options on ipmi-sensors (-v or -vv), you can find the sensor > numbers. As an example on my system: > > Record ID: 22 > Sensor Name: Fan5 > Group Name: Fan > Sensor Number: 18 > Event/Reading Type Code: 1h > > you can see the sensor number and record id don't match up. > > I'm not 100% why record ids were chosen for input/output over sensor > numbers in ipmi-sensors (the tool was originally created by others), but > if I had to guess for some reasons why: > > - some sensors don't have sensor numbers. I notice multiple sensors w/ > sensor number 0x00 in the ipmitool output below. I would guess those > sensors don't have a number so they just output 0x00. > > - record ids increase in value, while sensor numbers need not, so > outputting record ids looks nicer, maybe? The output order in ipmitool > also seems to be record id based, but they just output the sensor number > instead of the record id. > > As an FYI if you were wondering why sensors seem to be missing from > ipmi-sensors, our default output does not output every sensor by > default. Some are only retrievable via the verbose options. > > Hope that helps clarify things. > > Al > > On Wed, 2007-10-10 at 11:06 +0200, Gregor Dschung wrote: > > Hey Al, > > > > mmmh.... now, I'm really confused. I thought, the sensor-id has to be 8 > > bit long? > > > > Also I'm confused about the different sensor-ids I'm getting with > > ipmi-sensors (0.4.6.beta2) and `ipmitool sdr elist` (1.8.6). Sure, > > ipmitool is giving me the sensor id as Hex and ipmi-sensors as a decimal > > number... but the converted value should be the same? > > I would like to set up a PEF-Table, but for that, I'll need the right > > sensor-ids :-/ > > > > Example 1: > > > > p300slg01:/usr/local/src # ipmitool -H gtseval-ipmi -U ADMIN -a sdr > > elist all > > Password: > > Hewlett-Packard | 00h | ok | 0.0 | Dynamic MC @ 20h > > ACPI State | 20h | ok | 0.0 | S0/G0: working > > System Reset | 21h | ok | 0.0 | > > POST Error | 01h | ns | 0.0 | Disabled > > Memory ECC | 02h | ns | 0.0 | Disabled > > PCI Error | 03h | ns | 0.0 | Disabled > > Fan Error | 04h | ns | 0.0 | Disabled > > Watchdog | FEh | ns | 0.0 | Disabled > > CPU Fan 1 | 31h | ok | 0.0 | 9592.33 RPM > > CPU Fan 2 | 32h | ok | 0.0 | 10426.44 RPM > > CPU Fan 3 | 33h | ok | 0.0 | 9992.01 RPM > > CPU Fan 4 | 34h | ok | 0.0 | 10900.37 RPM > > CPU Fan 5 | 35h | ok | 0.0 | 9592.33 RPM > > CPU Fan 6 | 3Ch | ok | 0.0 | 10900.37 RPM > > CPU Fan 7 | 3Dh | ok | 0.0 | 9992.01 RPM > > CPU Fan 8 | 3Eh | ok | 0.0 | 10426.44 RPM > > CPU Fan 9 | 3Fh | ok | 0.0 | 9592.33 RPM > > CPU Fan 10 | 40h | ok | 0.0 | 10426.44 RPM > > System Fan 1 | 41h | ok | 0.0 | 9992.01 RPM > > System Fan 2 | 42h | ok | 0.0 | 10900.37 RPM > > CPU0 Vcore | 3Ah | ok | 3.0 | 1.10 Volts > > CPU1 Vcore | 3Bh | ns | 3.1 | No Reading > > Standby 5V | 37h | ok | 0.0 | 4.97 Volts > > System 5V | 36h | ok | 0.0 | 4.85 Volts > > System 3.3V | 38h | ok | 0.0 | 3.23 Volts > > 3V CMOS Sense | 39h | ok | 0.0 | 3.03 Volts > > CPU0 Therm Diode | 43h | ns | 3.0 | Disabled > > CPU1 Therm Diode | 44h | ns | 3.1 | Disabled > > CPU0 ThermDiode2 | 52h | ns | 3.0 | Disabled > > CPU1 ThermDiode2 | 53h | ns | 3.1 | Disabled > > AMB Temp | 48h | ok | 0.0 | 29 degrees C > > MultiBit ECC ER | 4Ah | ok | 0.0 | State Deasserted > > VDD Power Fail | 4Ch | ok | 0.0 | State Deasserted > > Reset | 4Dh | ok | 0.0 | State Deasserted > > Identify | 4Eh | ok | 0.0 | State Deasserted > > NMI | 50h | ok | 0.0 | State Deasserted > > CPU0 Therm-Trip | 55h | ok | 3.0 | State Deasserted > > CPU1 Therm-Trip | 56h | ns | 3.1 | No Reading > > CPU0 IERR | 57h | ok | 3.0 | State Deasserted > > CPU1 IERR | 58h | ns | 3.1 | No Reading > > CPU0 Prochot | 59h | ok | 3.0 | Limit Not Exceeded > > CPU1 Prochot | 5Ah | ns | 3.1 | No Reading > > CPU0 SocketOcc | 5Bh | ok | 3.0 | Device Present > > CPU1 SocketOcc | 5Ch | ok | 3.1 | Device Absent > > CPU0 Dmn 0 Temp | 86h | ok | 3.0 | 45 degrees C > > CPU1 Dmn 0 Temp | 89h | ns | 3.1 | No Reading > > CPU0 Dmn 1 Temp | 8Ch | ok | 3.0 | 45 degrees C > > CPU1 Dmn 1 Temp | 8Fh | ns | 3.1 | No Reading > > FRU0 | 00h | ns | 0.0 | Logical FRU @00h > > ---------- > > p300slg01:/usr/local/src # ipmi-sensors -h gtseval-ipmi -u ADMIN -P > > Password: > > 64: ACPI State (ACPI Power State): [S0/G0 "working"] > > 112: System Reset (Module/Board): [OK] > > 160: POST Error (System Firmware): [Unknown] > > 208: Memory ECC (Memory): [Unknown] > > 256: PCI Error (Critical Interrupt): [Unknown] > > 304: Fan Error (Cooling Device): [Unknown] > > 352: Watchdog (Watchdog 2): [Unknown] > > 400: CPU Fan 1 (Fan): 9992.01 RPM (NA/3475.48): [OK] > > 464: CPU Fan 2 (Fan): 10426.44 RPM (NA/3475.48): [OK] > > 528: CPU Fan 3 (Fan): 9992.01 RPM (NA/3475.48): [OK] > > 592: CPU Fan 4 (Fan): 10900.37 RPM (NA/3475.48): [OK] > > 656: CPU Fan 5 (Fan): 9592.33 RPM (NA/3475.48): [OK] > > 720: CPU Fan 6 (Fan): 10900.37 RPM (NA/3475.48): [OK] > > 784: CPU Fan 7 (Fan): 10426.44 RPM (NA/3475.48): [OK] > > 848: CPU Fan 8 (Fan): 10426.44 RPM (NA/3475.48): [OK] > > 912: CPU Fan 9 (Fan): 9992.01 RPM (NA/3475.48): [OK] > > 976: CPU Fan 10 (Fan): 10426.44 RPM (NA/3475.48): [OK] > > 1040: System Fan 1 (Fan): 9992.01 RPM (NA/3475.48): [OK] > > 1104: System Fan 2 (Fan): 10900.37 RPM (NA/3475.48): [OK] > > 1168: CPU0 Vcore (Voltage): 1.10 V (0.40/1.70): [OK] > > 1232: CPU1 Vcore (Voltage): 0.80 V (0.40/1.70): [OK] > > 1296: Standby 5V (Voltage): 4.97 V (4.26/5.79): [OK] > > 1360: System 5V (Voltage): 4.85 V (4.26/5.79): [OK] > > 1424: System 3.3V (Voltage): 3.23 V (2.82/3.85): [OK] > > 1488: 3V CMOS Sense (Voltage): 3.03 V (2.62/NA): [OK] > > 1680: CPU0 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK] > > 1744: CPU1 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK] > > 1808: CPU0 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK] > > 1872: CPU1 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK] > > 1936: AMB Temp (Temperature): 29.00 C (10.00/50.00): [OK] > > 2064: MultiBit ECC ER (Module/Board): [State Deasserted] > > 2112: VDD Power Fail (Power Supply): [State Deasserted] > > 2160: Reset (Button): [State Deasserted] > > 2208: Identify (Button): [State Deasserted] > > 2304: NMI (Button): [State Deasserted] > > 2352: CPU0 Therm-Trip (Processor): [State Deasserted] > > 2400: CPU1 Therm-Trip (Processor): [State Deasserted] > > 2448: CPU0 IERR (Processor): [State Deasserted] > > 2496: CPU1 IERR (Processor): [State Deasserted] > > 2544: CPU0 Prochot (Temperature): [Limit Not Exceeded] > > 2592: CPU1 Prochot (Temperature): [Limit Not Exceeded] > > 2640: CPU0 SocketOcc (Processor): [Device Inserted/Device Present] > > 2688: CPU1 SocketOcc (Processor): [Device Removed/Device Absent] > > 2736: CPU0 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK] > > 2864: CPU1 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK] > > 3248: CPU0 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK] > > 3440: CPU1 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK] > > > > Example 2: > > p300slg01:/usr/local/src # ipmitool -H gts00-ipmi -U ADMIN -a sdr elist all > > Password: > > pef | FDh | ns | 46.1 | Event-Only > > watchdog | FEh | ns | 46.1 | Event-Only > > KIM BMC | 00h | ok | 0.0 | Dynamic MC @ 20h > > PLTFRM SECURITY | FCh | ns | 0.0 | Event-Only > > CPU Temp 1 | 00h | ok | 3.0 | 22 degrees C > > CPU Temp 2 | 01h | ok | 3.0 | 21 degrees C > > CPU Temp 3 | 02h | ns | 3.1 | No Reading > > CPU Temp 4 | 03h | ns | 3.1 | No Reading > > Sys Temp | 04h | ok | 7.0 | 36 degrees C > > CPU1 Vcore | 05h | ok | 3.0 | 1.19 Volts > > CPU2 Vcore | 06h | ok | 3.1 | 1.21 Volts > > 3.3V | 07h | ok | 7.0 | 3.34 Volts > > 5V | 08h | ok | 7.0 | 4.99 Volts > > 12V | 09h | ok | 7.0 | 11.52 Volts > > -12V | 0Ah | ok | 7.0 | -12.30 Volts > > 1.5V | 0Bh | ok | 7.0 | 1.47 Volts > > 5VSB | 0Ch | ok | 7.0 | 4.92 Volts > > VBAT | 0Dh | ok | 7.0 | 3.31 Volts > > Fan1 | 0Eh | ok | 7.0 | 4400 RPM > > Fan2 | 0Fh | lnr | 7.0 | 0 RPM > > Fan3 | 10h | ok | 7.0 | 4400 RPM > > Fan4 | 11h | lnr | 7.0 | 0 RPM > > Fan5 | 12h | lnr | 7.0 | 0 RPM > > Fan6 | 13h | lnr | 7.0 | 0 RPM > > Fan7/CPU1 | 14h | lnr | 3.0 | 0 RPM > > Fan8/CPU2 | 15h | lnr | 3.0 | 0 RPM > > Intrusion | 44h | lnc | 23.1 | 0 unspecified > > Power Supply | 16h | ok | 10.0 | 0 unspecified > > CPU0 Internal E | 17h | ok | 3.0 | 0 unspecified > > CPU1 Internal E | 18h | ok | 3.1 | 0 unspecified > > CPU Overheat | 19h | ok | 3.0 | 0 unspecified > > Thermal Trip0 | 1Ah | ok | 3.0 | 0 unspecified > > Thermal Trip1 | 1Bh | ok | 3.1 | 0 unspecified > > BIOS | 00h | ok | 0.0 | > > -------- > > p300slg01:/usr/local/src # ipmi-sensors -h gts00-ipmi -u ADMIN -P > > Password: > > 4: CPU Temp 1 (Temperature): 22.00 C (NA/78.00): [OK] > > 5: CPU Temp 2 (Temperature): 21.00 C (NA/78.00): [OK] > > 6: CPU Temp 3 (Temperature): 0.00 C (NA/78.00): [OK] > > 7: CPU Temp 4 (Temperature): 0.00 C (NA/78.00): [OK] > > 8: Sys Temp (Temperature): 36.00 C (NA/78.00): [OK] > > 9: CPU1 Vcore (Voltage): 1.20 V (1.06/1.63): [OK] > > 10: CPU2 Vcore (Voltage): 1.21 V (1.06/1.63): [OK] > > 11: 3.3V (Voltage): 3.34 V (2.93/3.66): [OK] > > 12: 5V (Voltage): 4.99 V (4.44/5.54): [OK] > > 13: 12V (Voltage): 11.52 V (10.56/13.44): [OK] > > 14: -12V (Voltage): -12.30 V (-10.59/-13.40): [OK] > > 15: 1.5V (Voltage): 1.47 V (1.31/1.68): [OK] > > 16: 5VSB (Voltage): 4.92 V (4.44/5.54): [OK] > > 17: VBAT (Voltage): 3.31 V (2.93/3.66): [OK] > > 18: Fan1 (Fan): 4400.00 RPM (300.00/NA): [OK] > > 19: Fan2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower > > Non-Recoverable Threshold] > > 20: Fan3 (Fan): 4300.00 RPM (300.00/NA): [OK] > > 21: Fan4 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower > > Non-Recoverable Threshold] > > 22: Fan5 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower > > Non-Recoverable Threshold] > > 23: Fan6 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower > > Non-Recoverable Threshold] > > 24: Fan7/CPU1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower > > Non-Recoverable Threshold] > > 25: Fan8/CPU2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower > > Non-Recoverable Threshold] > > 26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion] > > 27: Power Supply (Power Supply): [OK] > > 28: CPU0 Internal E (Module/Board): [OK] > > 29: CPU1 Internal E (Module/Board): [OK] > > 30: CPU Overheat (Module/Board): [OK] > > 31: Thermal Trip0 (Module/Board): [OK] > > 32: Thermal Trip1 (Module/Board): [OK] > > 33: BIOS (System Firmware): [Unknown] > > > > > > I hope, I only forget something and that's not a new bug. > > > > Regards, > > Gregor > > > > > > Gregor Dschung wrote: > > > Hey Al, > > > > > > whoa!!! > > > > > > THAT is OpenSource :). We've mailed perhaps for a week (I guess it would > > > have taken only about three days, if we had worked both in the same > > > timezone ;) ). And now, the issue seams to be solved: > > > ----------- > > > p300slg01:/usr/local/src # ipmi-sensors -h gtseval-ipmi -u admin -P > > > Password: > > > 64: ACPI State (ACPI Power State): [S0/G0 "working"] > > > 112: System Reset (Module/Board): [OK] > > > 160: POST Error (System Firmware): [Unknown] > > > 208: Memory ECC (Memory): [Unknown] > > > 256: PCI Error (Critical Interrupt): [Unknown] > > > 304: Fan Error (Cooling Device): [Unknown] > > > 352: Watchdog (Watchdog 2): [Unknown] > > > 400: CPU Fan 1 (Fan): 9992.01 RPM (NA/3475.48): [OK] > > > 464: CPU Fan 2 (Fan): 10426.44 RPM (NA/3475.48): [OK] > > > 528: CPU Fan 3 (Fan): 9992.01 RPM (NA/3475.48): [OK] > > > 592: CPU Fan 4 (Fan): 10426.44 RPM (NA/3475.48): [OK] > > > 656: CPU Fan 5 (Fan): 9592.33 RPM (NA/3475.48): [OK] > > > 720: CPU Fan 6 (Fan): 10900.37 RPM (NA/3475.48): [OK] > > > 784: CPU Fan 7 (Fan): 9992.01 RPM (NA/3475.48): [OK] > > > 848: CPU Fan 8 (Fan): 10900.37 RPM (NA/3475.48): [OK] > > > 912: CPU Fan 9 (Fan): 9992.01 RPM (NA/3475.48): [OK] > > > 976: CPU Fan 10 (Fan): 10426.44 RPM (NA/3475.48): [OK] > > > 1040: System Fan 1 (Fan): 9592.33 RPM (NA/3475.48): [OK] > > > 1104: System Fan 2 (Fan): 10900.37 RPM (NA/3475.48): [OK] > > > 1168: CPU0 Vcore (Voltage): 1.11 V (0.40/1.70): [OK] > > > 1232: CPU1 Vcore (Voltage): 0.80 V (0.40/1.70): [OK] > > > 1296: Standby 5V (Voltage): 4.97 V (4.26/5.79): [OK] > > > 1360: System 5V (Voltage): 4.85 V (4.26/5.79): [OK] > > > 1424: System 3.3V (Voltage): 3.23 V (2.82/3.85): [OK] > > > 1488: 3V CMOS Sense (Voltage): 3.03 V (2.62/NA): [OK] > > > 1680: CPU0 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK] > > > 1744: CPU1 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK] > > > 1808: CPU0 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK] > > > 1872: CPU1 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK] > > > 1936: AMB Temp (Temperature): 29.00 C (10.00/50.00): [OK] > > > 2064: MultiBit ECC ER (Module/Board): [State Deasserted] > > > 2112: VDD Power Fail (Power Supply): [State Deasserted] > > > 2160: Reset (Button): [State Deasserted] > > > 2208: Identify (Button): [State Deasserted] > > > 2304: NMI (Button): [State Deasserted] > > > 2352: CPU0 Therm-Trip (Processor): [State Deasserted] > > > 2400: CPU1 Therm-Trip (Processor): [State Deasserted] > > > 2448: CPU0 IERR (Processor): [State Deasserted] > > > 2496: CPU1 IERR (Processor): [State Deasserted] > > > 2544: CPU0 Prochot (Temperature): [Limit Not Exceeded] > > > 2592: CPU1 Prochot (Temperature): [Limit Not Exceeded] > > > 2640: CPU0 SocketOcc (Processor): [Device Inserted/Device Present] > > > 2688: CPU1 SocketOcc (Processor): [Device Removed/Device Absent] > > > 2736: CPU0 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK] > > > 2864: CPU1 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK] > > > 3248: CPU0 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK] > > > 3440: CPU1 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK] > > > ------------- > > > > > > Thanks a lot for your help. > > > > > > Regards, > > > Gregor > > > > > > > > > Albert Chu wrote: > > >> Hey Gregor, > > >> > > >> Doh! I forgot a patch. Here's the next likely FreeIPMI 0.4.6 release > > >> :-) > > >> > > >> PLMK if it works. > > >> > > >> Thanks, > > >> Al > > >> > > >>> Hey Gregor, > > >>> > > >>> Attached are two tar.gz files. One is a likely candiate for the > > >>> FreeIPMI 0.4.6 release and another test tar.gz for debug info if > > >>> something new goes wrong :-) > > >>> > > >>> PLMK how it works out. Thanks for all the debug help. > > >>> > > >>> Al > > >>> > > >>> On Tue, 2007-10-09 at 17:25 +0200, Gregor Dschung wrote: > > >>>> Hey Al, > > >>>> > > >>>> here is the sdr-cache. 'sdr-cache-p300slg01.10.136.17.128' is the file > > >>>> for gtseval-ipmi, 'sdr-cache-p300slg01.10.136.17.170' is an other cache > > >>>> file from a call of ipmi-sensors which works fine. > > >>>> > > >>>> I'm using FreeIPMI on a system with SUSE 10.1. > > >>>> --------- > > >>>> p300slg01:/usr/local/src # uname -a > > >>>> Linux p300slg01 2.6.16.27-0.9-smp #1 SMP Tue Feb 13 09:35:18 UTC 2007 > > >>>> i686 i686 i386 GNU/Linux > > >>>> --------- > > >>>> > > >>>> In your test4-code, I had to change the following lines to compile w/o > > >>>> errors: > > >>>> common/src/pstdout.c > > >>>> -243: fprintf(stderr, "Default stack size = %li bytes \n", > > >>>> mystacksize); > > >>>> +243: fprintf(stderr, "Default stack size = %li bytes \n", > > >>>> (long)mystacksize); > > >>>> +501: va_list vacpy; > > >>>> > > >>>> --------- > > >>>> > > >>>> I've tested FreeIPMI locally again. I was wrong, it crashes, too. I > > >>>> guess, I was confused with IPMItool, which runs fine locally but gives > > >>>> warnings over the network. Don't know whether it helps you: > > >>>> Locally: > > >>>> [EMAIL PROTECTED]:~/ipmi/usr/bin> ./ipmitool -I open sensor > > >>>> ACPI State | 0x1 | discrete | 0x0180| na | > > >>>> na | na | na | na | na > > >>>> System Reset | 0x0 | discrete | 0x0080| na | > > >>>> na | na | na | na | na > > >>>> POST Error | na | discrete | na | na | > > >>>> na | na | na | na | na > > >>>> Memory ECC | na | discrete | na | na | > > >>>> na | na | na | na | na > > >>>> PCI Error | na | discrete | na | na | > > >>>> na | na | na | na | na > > >>>> Fan Error | na | discrete | na | na | > > >>>> na | na | na | na | na > > >>>> Watchdog | na | discrete | na | na | > > >>>> na | na | na | na | na > > >>>> CPU Fan 1 | 9992.006 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> CPU Fan 2 | 10426.441 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> CPU Fan 3 | 9992.006 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> CPU Fan 4 | 10426.441 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> CPU Fan 5 | 9223.391 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> CPU Fan 6 | 10900.371 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> CPU Fan 7 | 9992.006 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> CPU Fan 8 | 10900.371 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> CPU Fan 9 | 9992.006 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> CPU Fan 10 | 10426.441 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> System Fan 1 | 9992.006 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> System Fan 2 | 10900.371 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> CPU0 Vcore | 1.107 | Volts | ok | na | > > >>>> 0.402 | 0.500 | 1.597 | 1.695 | na > > >>>> CPU1 Vcore | na | Volts | na | na | > > >>>> 0.402 | 0.500 | 1.597 | 1.695 | na > > >>>> Standby 5V | 4.969 | Volts | ok | na | > > >>>> 4.263 | 4.528 | 5.527 | 5.792 | na > > >>>> System 5V | 4.851 | Volts | ok | na | > > >>>> 4.263 | 4.528 | 5.527 | 5.792 | na > > >>>> System 3.3V | 3.234 | Volts | ok | na | > > >>>> 2.822 | 2.999 | 3.675 | 3.851 | na > > >>>> 3V CMOS Sense | 3.028 | Volts | ok | na | > > >>>> 2.617 | 2.781 | na | na | na > > >>>> CPU0 Therm Diode | na | degrees C | na | na | > > >>>> 10.000 | na | 68.000 | 80.000 | 95.000 > > >>>> CPU1 Therm Diode | na | degrees C | na | na | > > >>>> 10.000 | na | 68.000 | 80.000 | 95.000 > > >>>> CPU0 ThermDiode2 | na | degrees C | na | na | > > >>>> 10.000 | na | 68.000 | 80.000 | 95.000 > > >>>> CPU1 ThermDiode2 | na | degrees C | na | na | > > >>>> 10.000 | na | 68.000 | 80.000 | 95.000 > > >>>> AMB Temp | 29.000 | degrees C | ok | na | > > >>>> 10.000 | na | 30.000 | 45.000 | na > > >>>> MultiBit ECC ER | 0x0 | discrete | 0x0180| na | > > >>>> na | na | na | na | na > > >>>> VDD Power Fail | 0x0 | discrete | 0x0180| na | > > >>>> na | na | na | na | na > > >>>> Reset | 0x0 | discrete | 0x0180| na | > > >>>> na | na | na | na | na > > >>>> Identify | 0x0 | discrete | 0x0180| na | > > >>>> na | na | na | na | na > > >>>> NMI | 0x0 | discrete | 0x0180| na | > > >>>> na | na | na | na | na > > >>>> CPU0 Therm-Trip | 0x0 | discrete | 0x0180| na | > > >>>> na | na | na | na | na > > >>>> CPU1 Therm-Trip | na | discrete | na | na | > > >>>> na | na | na | na | na > > >>>> CPU0 IERR | 0x0 | discrete | 0x0180| na | > > >>>> na | na | na | na | na > > >>>> CPU1 IERR | na | discrete | na | na | > > >>>> na | na | na | na | na > > >>>> CPU0 Prochot | 0x0 | discrete | 0x0180| na | > > >>>> na | na | na | na | na > > >>>> CPU1 Prochot | na | discrete | na | na | > > >>>> na | na | na | na | na > > >>>> CPU0 SocketOcc | 0x1 | discrete | 0x0280| na | > > >>>> na | na | na | na | na > > >>>> CPU1 SocketOcc | 0x0 | discrete | 0x0180| na | > > >>>> na | na | na | na | na > > >>>> CPU0 Dmn 0 Temp | 45.000 | degrees C | ok | na | > > >>>> na | na | na | 85.000 | 95.000 > > >>>> CPU1 Dmn 0 Temp | na | degrees C | na | na | > > >>>> na | na | na | 85.000 | 95.000 > > >>>> CPU0 Dmn 1 Temp | 46.000 | degrees C | ok | na | > > >>>> na | na | na | 85.000 | 95.000 > > >>>> CPU1 Dmn 1 Temp | na | degrees C | na | na | > > >>>> na | na | na | 85.000 | 95.000 > > >>>> > > >>>> Over a RCMP+-Session: > > >>>> [...] > > >>>> System Reset | 0x0 | discrete | 0x0080| na | > > >>>> na | na | na | na | na > > >>>> Error reading sensor POST Error (#01) > > >>>> Error reading sensor Memory ECC (#02) > > >>>> Error reading sensor PCI Error (#03) > > >>>> Error reading sensor Fan Error (#04) > > >>>> Watchdog | na | discrete | na | na | > > >>>> na | na | na | na | na > > >>>> CPU Fan 1 | 9992.006 | RPM | ok | na | > > >>>> na | na | 3996.803 | 3475.480 | na > > >>>> [...] > > >>>> > > >>>> The missing lines are equal. > > >>>> ----------- > > >>>> > > >>>> I've called ipmi-sensors from an x86_64 to reach gtseval-ipmi, too. And > > >>>> it crashes with the same error (second attachment). > > >>>> > > >>>> So... Enough debugging for today. > > >>>> > > >>>> Have a nice day, > > >>>> Gregor > > >>>> > > >>>> Al Chu wrote: > > >>>>> Hey Gregor, > > >>>>> > > >>>>> Although it's unlikely your problem, I saw one other potential issue. > > >>>>> So I added a fix in this slightly newer tar.gz. > > >>>>> > > >>>>> Thanks, > > >>>>> Al > > >>>>> > > >>>>> On Mon, 2007-10-08 at 11:51 -0700, Al Chu wrote: > > >>>>>> Hey Gregor, > > >>>>>> > > >>>>>> Here's another tar.gz. Could you run ./configure with --enable-debug > > >>>>>> and run with --debug again? The gdb output confirms the line I > > >>>> believed > > >>>>>> was causing the problem, but I still can't quite figure out how the > > >>>>>> corruption is happening. So I put in a lot more printfs. > > >>>>>> > > >>>>>> I do have atleast two other suspicions, that depend on your system. > > >>>> So > > >>>>>> do you think you could also send me the SDR from > > >>>> ~/.freeipmi/sdr-cache/ > > >>>>>> for me to analyze and also could you tell me what linux you are > > >>>> running > > >>>>>> on the i386 box? I'm wondering if you have some older distribution > > >>>> (b/c > > >>>>>> its i386) and it has slightly different threads behavior that I'm not > > >>>>>> handling properly. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Al > > >>>>>> > > >>>>>> > > >>>>>> On Sun, 2007-10-07 at 12:12 +0200, Gregor Dschung wrote: > > >>>>>>> Hi Al, > > >>>>>>> > > >>>>>>> I attach again the output of the call with --debug and the > > >>>> backtrace. It > > >>>>>>> was the first time that I used gdb, so I hope I understood the > > >>>> tutorials > > >>>>>>> :) > > >>>>>>> > > >>>>>>> At the moment I'm not able to run ipmi-sensors locally, because I'm > > >>>> not > > >>>>>>> root on "gtseval" (the host of gtseval-ipmi) and I've to wait until > > >>>> I get > > >>>>>>> rw-rights for /dev/ipmi0 again. And we have week-end ;) > > >>>>>>> > > >>>>>>> You are right, I'm running the IPMItool and FreeIPMI on an i386. On > > >>>>>>> gtseval is a 64bit-System, so perhaps this is the reason for not > > >>>> crashing > > >>>>>>> locally. > > >>>>>>> > > >>>>>>> Have a nice Sunday, > > >>>>>>> Gregor > > >>>>>>> > > >>>>>>> > > >>>>>>>> Hey Gregor, > > >>>>>>>> > > >>>>>>>> Can't see anything suspicuous in the code. Here's another tar.gz > > >>>> that I > > >>>>>>>> added a whole bunch of extra printfs to try and give me more > > >>>> information, > > >>>>>>>> could you run again (./configure --enable-debug and run > > >>>> ipmi-sensors with > > >>>>>>>> --debug again). Also, you mentioned that ipmi-sensors completes > > >>>> locally > > >>>>>>>> without issue. Are the number of sensor listed below (ending w/ > > >>>> CPU1 Dmn > > >>>>>>>> 1 Temp) the same as the number of sensors listed when you run > > >>>> locally? > > >>>>>>>> Also, is a core dump being output by this crash? Could you run gdb > > >>>>>>>> against the core and get a backtrace? That'd be a lot of help too. > > >>>>>>>> > > >>>>>>>> Thanks for helping me look into this, > > >>>>>>>> > > >>>>>>>> Al > > >>>>>>>> > > >>>>>>>>> Hi Al, > > >>>>>>>>> > > >>>>>>>>> thanks for your fast answer. > > >>>>>>>>> > > >>>>>>>>> I've tested your test-version and it seems to be on the correct > > >>>> way. It > > >>>>>>>>> still crashes, but now I get sensor-data :) : > > >>>>>>>>> > > >>>>>>>>> [...] > > >>>>>>>>> > > >>>>>>>> -- > > >>>>>>>> Albert Chu > > >>>>>>>> [EMAIL PROTECTED] > > >>>>>>>> 925-422-5311 > > >>>>>>>> Computer Scientist > > >>>>>>>> High Performance Systems Division > > >>>>>>>> Lawrence Livermore National Laboratory > > >>>>>>>> > > >>> -- > > >>> Albert Chu > > >>> [EMAIL PROTECTED] > > >>> 925-422-5311 > > >>> Computer Scientist > > >>> High Performance Systems Division > > >>> Lawrence Livermore National Laboratory > > >>> > > > > > > > -- Albert Chu [EMAIL PROTECTED] 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel