I spent the last couple of days retracing my steps. In my haste, I listed the wrong HCA firmware revision. It was firmware 1.2.940 that caused the system to crash while booting to Linux. I have the mthca driver built into the kernel; it is not a loadable driver. The system boots fine with the 1.2.0 firmware.
You are correct; the system crash does not make sense since the stack was okay a few instruction earlier. I am currently looking at the error dump of the system log to see if I can find out more. There is no timing issue in the driver function ib_mad_post_receive_mads() and the debug printk messages in this function do not solve the system crash. On Thu, Mar 26, 2009 at 8:54 AM, Roland Dreier <[email protected]> wrote: > > System crashes with three Mellanox mezzanine cards (VID=15b3, > > DID=0x6274) installed when booting Linux (ia64). I am using Linux > > 2.6.24, but this issue also occurs with Linux kernel 2.6.29-rc8. > > this is a pretty interesting crash. Do you have the ib_mthca driver > built into your kernel, or is it being loaded as a module? > > > A partial listing from ib_mad_post_receive_mad.S is posted below the "C" > code. > > The exact instruction that cause the system crash was located at > > > > ib_mad_post_*+0x0080 st4 [r2]=r3 > MII > > nop.i 0x0 > > nop.i 0x0 > > > > It tries to store r3=0x1600 to [r2] @ 0xE0000007E01C7CCC. > > Looking at the assembly, it seems the relevant parts are: > > ib_mad_post_*+0x0060 ld4 r3=[r11] > MMI > st8 [r2]=r8 > adds r2=28,r12 > ib_mad_post_*+0x0070 st4 [r9]=r10 > MMI > st8 [r45]=r0 > nop.i 0x0;; > ib_mad_post_*+0x0080 st4 [r2]=r3 > MII > > The main points are "adds r2=28,r12" -- ie r2 now points into the > stack -- and "st4 [r2]=r3" -- ie a store onto the stack is crashing. > > In the same function, we have "adds r9=56,r12" and "st4 [r9]=r10" > slightly earlier, so the stack isn't totally messed up (apparently). > > Not sure how to debug this since the crash as it stands doesn't seem to > make much sense... > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
