> The thing that all the IPoIB failures have in common seems to be > an appearance of a "CQ overrun" in syslog, e.g.: > > ib_mthca 0000:06:00.0: CQ overrun on CQN 180082
> We are using MT25204 HCAs with 1.2.0 firmware, and OFED 1.2. OFED 1.2 uses a separate CQ for send completions in connected mode. (I'm assuming you're using the OFED default of connected mode for IPoIB). I guess it would be useful to know which CQ is overrunning, ie whether it is the main IPoIB CQ or one of the CM send CQs. One way to check this would be to add a print to mthca to dump the CQN when a CQ is created, and also add prints to IPoIB just before each call to ib_create_cq() so that the CQNs can be correlated. Another thing you could try would be a 2.6.24-rc kernel (or an OFED 1.3 prerelease I guess), which has a change that moves all completions into one CQ in IPoIB. This may fix the bug by accident. - R. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
