> I dumped out the CQNs as they were created and generally the first > non-reserved CQs get made by ipoib_transport_dev_init() when ipoib > is brought up on each port. CQN 0x80 is used by port 0, 0x81 by > port 1.
Actually I think the first two CQs created are created by the MAD module: > Dec 2 10:19:23 r6i1n8 kernel: ib_mthca 0000:06:00.0: CQ overrun on CQN > 000080 > Dec 2 10:19:23 r6i1n8 kernel: ib_mad: Fatal error (1) on MAD QP (1) It seems that there is a CQ error and then ib_mad gets a catastrophic error on its QP. Given that you are seeing CQ overruns on two completely different types of QPs, I think its more likely there is some problem with the mthca driver's handling of updating the CQ consumer index than that there are two independent bugs being triggered by your test. What kind of hardware was this on again? It's x86-64, right? But is there anything out of the ordinary about these systems? - R. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
