Hey Gregor, I finally found it. The sdr you sent me was the key to figuring it out. The OEM data in your sdr cache was quite large (55 bytes) which triggered a buffer overflow.
The length of the buffer was actually handled properly in the code, except the uint8_t buffer was casted into a unsigned int * buffer. Obviously, when the oem data is small (< 14 bytes) there's no problem. It only occurs when we hit large oem data sizes. Thanks for working on this with me so much. I'll send you a new test.tar.gz later on in a private e-mail. Al On Tue, 2007-10-09 at 17:25 +0200, Gregor Dschung wrote: > Hey Al, > > here is the sdr-cache. 'sdr-cache-p300slg01.10.136.17.128' is the file > for gtseval-ipmi, 'sdr-cache-p300slg01.10.136.17.170' is an other cache > file from a call of ipmi-sensors which works fine. > > I'm using FreeIPMI on a system with SUSE 10.1. > --------- > p300slg01:/usr/local/src # uname -a > Linux p300slg01 2.6.16.27-0.9-smp #1 SMP Tue Feb 13 09:35:18 UTC 2007 > i686 i686 i386 GNU/Linux > --------- > > In your test4-code, I had to change the following lines to compile w/o > errors: > common/src/pstdout.c > -243: fprintf(stderr, "Default stack size = %li bytes \n", mystacksize); > +243: fprintf(stderr, "Default stack size = %li bytes \n", > (long)mystacksize); > +501: va_list vacpy; > > --------- > > I've tested FreeIPMI locally again. I was wrong, it crashes, too. I > guess, I was confused with IPMItool, which runs fine locally but gives > warnings over the network. Don't know whether it helps you: > Locally: > [EMAIL PROTECTED]:~/ipmi/usr/bin> ./ipmitool -I open sensor > ACPI State | 0x1 | discrete | 0x0180| na | > na | na | na | na | na > System Reset | 0x0 | discrete | 0x0080| na | > na | na | na | na | na > POST Error | na | discrete | na | na | > na | na | na | na | na > Memory ECC | na | discrete | na | na | > na | na | na | na | na > PCI Error | na | discrete | na | na | > na | na | na | na | na > Fan Error | na | discrete | na | na | > na | na | na | na | na > Watchdog | na | discrete | na | na | > na | na | na | na | na > CPU Fan 1 | 9992.006 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > CPU Fan 2 | 10426.441 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > CPU Fan 3 | 9992.006 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > CPU Fan 4 | 10426.441 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > CPU Fan 5 | 9223.391 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > CPU Fan 6 | 10900.371 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > CPU Fan 7 | 9992.006 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > CPU Fan 8 | 10900.371 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > CPU Fan 9 | 9992.006 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > CPU Fan 10 | 10426.441 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > System Fan 1 | 9992.006 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > System Fan 2 | 10900.371 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > CPU0 Vcore | 1.107 | Volts | ok | na | > 0.402 | 0.500 | 1.597 | 1.695 | na > CPU1 Vcore | na | Volts | na | na | > 0.402 | 0.500 | 1.597 | 1.695 | na > Standby 5V | 4.969 | Volts | ok | na | > 4.263 | 4.528 | 5.527 | 5.792 | na > System 5V | 4.851 | Volts | ok | na | > 4.263 | 4.528 | 5.527 | 5.792 | na > System 3.3V | 3.234 | Volts | ok | na | > 2.822 | 2.999 | 3.675 | 3.851 | na > 3V CMOS Sense | 3.028 | Volts | ok | na | > 2.617 | 2.781 | na | na | na > CPU0 Therm Diode | na | degrees C | na | na | > 10.000 | na | 68.000 | 80.000 | 95.000 > CPU1 Therm Diode | na | degrees C | na | na | > 10.000 | na | 68.000 | 80.000 | 95.000 > CPU0 ThermDiode2 | na | degrees C | na | na | > 10.000 | na | 68.000 | 80.000 | 95.000 > CPU1 ThermDiode2 | na | degrees C | na | na | > 10.000 | na | 68.000 | 80.000 | 95.000 > AMB Temp | 29.000 | degrees C | ok | na | > 10.000 | na | 30.000 | 45.000 | na > MultiBit ECC ER | 0x0 | discrete | 0x0180| na | > na | na | na | na | na > VDD Power Fail | 0x0 | discrete | 0x0180| na | > na | na | na | na | na > Reset | 0x0 | discrete | 0x0180| na | > na | na | na | na | na > Identify | 0x0 | discrete | 0x0180| na | > na | na | na | na | na > NMI | 0x0 | discrete | 0x0180| na | > na | na | na | na | na > CPU0 Therm-Trip | 0x0 | discrete | 0x0180| na | > na | na | na | na | na > CPU1 Therm-Trip | na | discrete | na | na | > na | na | na | na | na > CPU0 IERR | 0x0 | discrete | 0x0180| na | > na | na | na | na | na > CPU1 IERR | na | discrete | na | na | > na | na | na | na | na > CPU0 Prochot | 0x0 | discrete | 0x0180| na | > na | na | na | na | na > CPU1 Prochot | na | discrete | na | na | > na | na | na | na | na > CPU0 SocketOcc | 0x1 | discrete | 0x0280| na | > na | na | na | na | na > CPU1 SocketOcc | 0x0 | discrete | 0x0180| na | > na | na | na | na | na > CPU0 Dmn 0 Temp | 45.000 | degrees C | ok | na | > na | na | na | 85.000 | 95.000 > CPU1 Dmn 0 Temp | na | degrees C | na | na | > na | na | na | 85.000 | 95.000 > CPU0 Dmn 1 Temp | 46.000 | degrees C | ok | na | > na | na | na | 85.000 | 95.000 > CPU1 Dmn 1 Temp | na | degrees C | na | na | > na | na | na | 85.000 | 95.000 > > Over a RCMP+-Session: > [...] > System Reset | 0x0 | discrete | 0x0080| na | > na | na | na | na | na > Error reading sensor POST Error (#01) > Error reading sensor Memory ECC (#02) > Error reading sensor PCI Error (#03) > Error reading sensor Fan Error (#04) > Watchdog | na | discrete | na | na | > na | na | na | na | na > CPU Fan 1 | 9992.006 | RPM | ok | na | > na | na | 3996.803 | 3475.480 | na > [...] > > The missing lines are equal. > ----------- > > I've called ipmi-sensors from an x86_64 to reach gtseval-ipmi, too. And > it crashes with the same error (second attachment). > > So... Enough debugging for today. > > Have a nice day, > Gregor > > Al Chu wrote: > > Hey Gregor, > > > > Although it's unlikely your problem, I saw one other potential issue. > > So I added a fix in this slightly newer tar.gz. > > > > Thanks, > > Al > > > > On Mon, 2007-10-08 at 11:51 -0700, Al Chu wrote: > >> Hey Gregor, > >> > >> Here's another tar.gz. Could you run ./configure with --enable-debug > >> and run with --debug again? The gdb output confirms the line I believed > >> was causing the problem, but I still can't quite figure out how the > >> corruption is happening. So I put in a lot more printfs. > >> > >> I do have atleast two other suspicions, that depend on your system. So > >> do you think you could also send me the SDR from ~/.freeipmi/sdr-cache/ > >> for me to analyze and also could you tell me what linux you are running > >> on the i386 box? I'm wondering if you have some older distribution (b/c > >> its i386) and it has slightly different threads behavior that I'm not > >> handling properly. > >> > >> Thanks, > >> Al > >> > >> > >> On Sun, 2007-10-07 at 12:12 +0200, Gregor Dschung wrote: > >>> Hi Al, > >>> > >>> I attach again the output of the call with --debug and the backtrace. It > >>> was the first time that I used gdb, so I hope I understood the tutorials > >>> :) > >>> > >>> At the moment I'm not able to run ipmi-sensors locally, because I'm not > >>> root on "gtseval" (the host of gtseval-ipmi) and I've to wait until I get > >>> rw-rights for /dev/ipmi0 again. And we have week-end ;) > >>> > >>> You are right, I'm running the IPMItool and FreeIPMI on an i386. On > >>> gtseval is a 64bit-System, so perhaps this is the reason for not crashing > >>> locally. > >>> > >>> Have a nice Sunday, > >>> Gregor > >>> > >>> > >>>> Hey Gregor, > >>>> > >>>> Can't see anything suspicuous in the code. Here's another tar.gz that I > >>>> added a whole bunch of extra printfs to try and give me more information, > >>>> could you run again (./configure --enable-debug and run ipmi-sensors with > >>>> --debug again). Also, you mentioned that ipmi-sensors completes locally > >>>> without issue. Are the number of sensor listed below (ending w/ CPU1 Dmn > >>>> 1 Temp) the same as the number of sensors listed when you run locally? > >>>> > >>>> Also, is a core dump being output by this crash? Could you run gdb > >>>> against the core and get a backtrace? That'd be a lot of help too. > >>>> > >>>> Thanks for helping me look into this, > >>>> > >>>> Al > >>>> > >>>>> Hi Al, > >>>>> > >>>>> thanks for your fast answer. > >>>>> > >>>>> I've tested your test-version and it seems to be on the correct way. It > >>>>> still crashes, but now I get sensor-data :) : > >>>>> > >>>>> [...] > >>>>> > >>>> > >>>> -- > >>>> Albert Chu > >>>> [EMAIL PROTECTED] > >>>> 925-422-5311 > >>>> Computer Scientist > >>>> High Performance Systems Division > >>>> Lawrence Livermore National Laboratory > >>>> > > -- Albert Chu [EMAIL PROTECTED] 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/freeipmi-devel
