>I updated the bug with the step-by-step instructions how to burn >the FW and reproduce the error. >I compiled this "how-to" today, so everything there is up to date.
Thanks - I don't think that I was programming my FW correctly. I still have problems running opensm with qos enabled on one of my systems, but I can get it to work running on the other system. Anyway, I was able to reproduce the problem, and I believe I understand part of the problem. The send for the CM REQ MAD never completes. A completion never shows up on the GSI's CQ with a wr_id that matches the send wr_id. (I don't see a completion at all.) This results in a reference being held on the ib_cm id that is never released, which causes the hang. (Destruction of the ib_cm id hangs, which blocks the destruction of the rdma_cm_id, which blocks the close from userspace.) If the ib_cm is modified to use SL 0 for the CM MADs, but the connection still uses SL 1, then ucmatose is able to connect and transfer data between the client and server. - Sean _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
