> I spent the last couple of days retracing my steps. In my haste, I > listed the wrong HCA firmware revision. It was firmware 1.2.940 that > caused the system to crash while booting to Linux. I have the mthca > driver built into the kernel; it is not a loadable driver. The system > boots fine with the 1.2.0 firmware.
Oh, it's mthca firmware version dependent? That's a big clue: you're using mem-free firmware, which means the HCA uses system memory to store big chunks of internal state. If something is going wrong with how the memory is mapped to the HCA (or how the HCA writes to it) then that could cause memory corruption -- possibly tied to posting receives to the hardware as part of the MAD initialization. So it could be a driver bug exposed by the new firmware, or a firmware bug. Is Mellanox following this bug? Maybe they have some idea of how to figure out what the HCA is doing that could crash a system. - R. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
