Hey Gregor,

I finally found it.  The sdr you sent me was the key to figuring it out.
The OEM data in your sdr cache was quite large (55 bytes) which
triggered a buffer overflow.

The length of the buffer was actually handled properly in the code,
except the uint8_t buffer was casted into a unsigned int * buffer.
Obviously, when the oem data is small (< 14 bytes) there's no problem.
It only occurs when we hit large oem data sizes.

Thanks for working on this with me so much.  I'll send you a new
test.tar.gz later on in a private e-mail.

Al

On Tue, 2007-10-09 at 17:25 +0200, Gregor Dschung wrote:
> Hey Al,
> 
> here is the sdr-cache. 'sdr-cache-p300slg01.10.136.17.128' is the file
> for gtseval-ipmi, 'sdr-cache-p300slg01.10.136.17.170' is an other cache
> file from a call of ipmi-sensors which works fine.
> 
> I'm using FreeIPMI on a system with SUSE 10.1.
> ---------
> p300slg01:/usr/local/src # uname -a
> Linux p300slg01 2.6.16.27-0.9-smp #1 SMP Tue Feb 13 09:35:18 UTC 2007
> i686 i686 i386 GNU/Linux
> ---------
> 
> In your test4-code, I had to change the following lines to compile w/o
> errors:
> common/src/pstdout.c
> -243: fprintf(stderr, "Default stack size = %li bytes \n", mystacksize);
> +243: fprintf(stderr, "Default stack size = %li bytes \n",
> (long)mystacksize);
> +501: va_list vacpy;
> 
> ---------
> 
> I've tested FreeIPMI locally again. I was wrong, it crashes, too. I
> guess, I was confused with IPMItool, which runs fine locally but gives
> warnings over the network. Don't know whether it helps you:
> Locally:
> [EMAIL PROTECTED]:~/ipmi/usr/bin> ./ipmitool -I open sensor
> ACPI State       | 0x1        | discrete   | 0x0180| na        |
> na        | na        | na        | na        | na
> System Reset     | 0x0        | discrete   | 0x0080| na        |
> na        | na        | na        | na        | na
> POST Error       | na         | discrete   | na    | na        |
> na        | na        | na        | na        | na
> Memory ECC       | na         | discrete   | na    | na        |
> na        | na        | na        | na        | na
> PCI Error        | na         | discrete   | na    | na        |
> na        | na        | na        | na        | na
> Fan Error        | na         | discrete   | na    | na        |
> na        | na        | na        | na        | na
> Watchdog         | na         | discrete   | na    | na        |
> na        | na        | na        | na        | na
> CPU Fan 1        | 9992.006   | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> CPU Fan 2        | 10426.441  | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> CPU Fan 3        | 9992.006   | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> CPU Fan 4        | 10426.441  | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> CPU Fan 5        | 9223.391   | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> CPU Fan 6        | 10900.371  | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> CPU Fan 7        | 9992.006   | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> CPU Fan 8        | 10900.371  | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> CPU Fan 9        | 9992.006   | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> CPU Fan 10       | 10426.441  | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> System Fan 1     | 9992.006   | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> System Fan 2     | 10900.371  | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> CPU0 Vcore       | 1.107      | Volts      | ok    | na        |
> 0.402     | 0.500     | 1.597     | 1.695     | na
> CPU1 Vcore       | na         | Volts      | na    | na        |
> 0.402     | 0.500     | 1.597     | 1.695     | na
> Standby 5V       | 4.969      | Volts      | ok    | na        |
> 4.263     | 4.528     | 5.527     | 5.792     | na
> System 5V        | 4.851      | Volts      | ok    | na        |
> 4.263     | 4.528     | 5.527     | 5.792     | na
> System 3.3V      | 3.234      | Volts      | ok    | na        |
> 2.822     | 2.999     | 3.675     | 3.851     | na
> 3V CMOS Sense    | 3.028      | Volts      | ok    | na        |
> 2.617     | 2.781     | na        | na        | na
> CPU0 Therm Diode | na         | degrees C  | na    | na        |
> 10.000    | na        | 68.000    | 80.000    | 95.000
> CPU1 Therm Diode | na         | degrees C  | na    | na        |
> 10.000    | na        | 68.000    | 80.000    | 95.000
> CPU0 ThermDiode2 | na         | degrees C  | na    | na        |
> 10.000    | na        | 68.000    | 80.000    | 95.000
> CPU1 ThermDiode2 | na         | degrees C  | na    | na        |
> 10.000    | na        | 68.000    | 80.000    | 95.000
> AMB Temp         | 29.000     | degrees C  | ok    | na        |
> 10.000    | na        | 30.000    | 45.000    | na
> MultiBit ECC ER  | 0x0        | discrete   | 0x0180| na        |
> na        | na        | na        | na        | na
> VDD Power Fail   | 0x0        | discrete   | 0x0180| na        |
> na        | na        | na        | na        | na
> Reset            | 0x0        | discrete   | 0x0180| na        |
> na        | na        | na        | na        | na
> Identify         | 0x0        | discrete   | 0x0180| na        |
> na        | na        | na        | na        | na
> NMI              | 0x0        | discrete   | 0x0180| na        |
> na        | na        | na        | na        | na
> CPU0 Therm-Trip  | 0x0        | discrete   | 0x0180| na        |
> na        | na        | na        | na        | na
> CPU1 Therm-Trip  | na         | discrete   | na    | na        |
> na        | na        | na        | na        | na
> CPU0 IERR        | 0x0        | discrete   | 0x0180| na        |
> na        | na        | na        | na        | na
> CPU1 IERR        | na         | discrete   | na    | na        |
> na        | na        | na        | na        | na
> CPU0 Prochot     | 0x0        | discrete   | 0x0180| na        |
> na        | na        | na        | na        | na
> CPU1 Prochot     | na         | discrete   | na    | na        |
> na        | na        | na        | na        | na
> CPU0 SocketOcc   | 0x1        | discrete   | 0x0280| na        |
> na        | na        | na        | na        | na
> CPU1 SocketOcc   | 0x0        | discrete   | 0x0180| na        |
> na        | na        | na        | na        | na
> CPU0 Dmn 0 Temp  | 45.000     | degrees C  | ok    | na        |
> na        | na        | na        | 85.000    | 95.000
> CPU1 Dmn 0 Temp  | na         | degrees C  | na    | na        |
> na        | na        | na        | 85.000    | 95.000
> CPU0 Dmn 1 Temp  | 46.000     | degrees C  | ok    | na        |
> na        | na        | na        | 85.000    | 95.000
> CPU1 Dmn 1 Temp  | na         | degrees C  | na    | na        |
> na        | na        | na        | 85.000    | 95.000
> 
> Over a RCMP+-Session:
> [...]
> System Reset     | 0x0        | discrete   | 0x0080| na        |
> na        | na        | na        | na        | na
> Error reading sensor POST Error (#01)
> Error reading sensor Memory ECC (#02)
> Error reading sensor PCI Error (#03)
> Error reading sensor Fan Error (#04)
> Watchdog         | na         | discrete   | na    | na        |
> na        | na        | na        | na        | na
> CPU Fan 1        | 9992.006   | RPM        | ok    | na        |
> na        | na        | 3996.803  | 3475.480  | na
> [...]
> 
> The missing lines are equal.
> -----------
> 
> I've called ipmi-sensors from an x86_64 to reach gtseval-ipmi, too. And
> it crashes with the same error (second attachment).
> 
> So... Enough debugging for today.
> 
> Have a nice day,
> Gregor
> 
> Al Chu wrote:
> > Hey Gregor,
> >
> > Although it's unlikely your problem, I saw one other potential issue.
> > So I added a fix in this slightly newer tar.gz.
> >
> > Thanks,
> > Al
> >
> > On Mon, 2007-10-08 at 11:51 -0700, Al Chu wrote:
> >> Hey Gregor,
> >>
> >> Here's another tar.gz.  Could you run ./configure with --enable-debug
> >> and run with --debug again?  The gdb output confirms the line I believed
> >> was causing the problem, but I still can't quite figure out how the
> >> corruption is happening.  So I put in a lot more printfs.
> >>
> >> I do have atleast two other suspicions, that depend on your system.  So
> >> do you think you could also send me the SDR from ~/.freeipmi/sdr-cache/
> >> for me to analyze and also could you tell me what linux you are running
> >> on the i386 box?  I'm wondering if you have some older distribution (b/c
> >> its i386) and it has slightly different threads behavior that I'm not
> >> handling properly.
> >>
> >> Thanks,
> >> Al
> >>
> >>
> >> On Sun, 2007-10-07 at 12:12 +0200, Gregor Dschung wrote:
> >>> Hi Al,
> >>>
> >>> I attach again the output of the call with --debug and the backtrace. It
> >>> was the first time that I used gdb, so I hope I understood the tutorials
> >>> :)
> >>>
> >>> At the moment I'm not able to run ipmi-sensors locally, because I'm not
> >>> root on "gtseval" (the host of gtseval-ipmi) and I've to wait until I get
> >>> rw-rights for /dev/ipmi0 again. And we have week-end ;)
> >>>
> >>> You are right, I'm running the IPMItool and FreeIPMI on an i386. On
> >>> gtseval is a 64bit-System, so perhaps this is the reason for not crashing
> >>> locally.
> >>>
> >>> Have a nice Sunday,
> >>> Gregor
> >>>
> >>>
> >>>> Hey Gregor,
> >>>>
> >>>> Can't see anything suspicuous in the code.  Here's another tar.gz that I
> >>>> added a whole bunch of extra printfs to try and give me more information,
> >>>> could you run again (./configure --enable-debug and run ipmi-sensors with
> >>>> --debug again).  Also, you mentioned that ipmi-sensors completes locally
> >>>> without issue.  Are the number of sensor listed below (ending w/ CPU1 Dmn
> >>>> 1 Temp) the same as the number of sensors listed when you run locally?
> >>>>
> >>>> Also, is a core dump being output by this crash?  Could you run gdb
> >>>> against the core and get a backtrace?  That'd be a lot of help too.
> >>>>
> >>>> Thanks for helping me look into this,
> >>>>
> >>>> Al
> >>>>
> >>>>> Hi Al,
> >>>>>
> >>>>> thanks for your fast answer.
> >>>>>
> >>>>> I've tested your test-version and it seems to be on the correct way. It
> >>>>> still crashes, but now I get sensor-data :) :
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>
> >>>> --
> >>>> Albert Chu
> >>>> [EMAIL PROTECTED]
> >>>> 925-422-5311
> >>>> Computer Scientist
> >>>> High Performance Systems Division
> >>>> Lawrence Livermore National Laboratory
> >>>>
> 
> 
-- 
Albert Chu
[EMAIL PROTECTED]
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


_______________________________________________
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel

Reply via email to