Hi, I'm trying out APM with OFED 1.2 , using Mellanox dual-port HCA (ib_mthca driver). When I have several RCQP's that I am trying to migrate (software triggered migration using ib_modify_qp), I've noticed that sometimes 1 or 2 of the remote QP's never generate an IB_EVENT_PATH_MIG or even an IB_EVENT_PATH_MIG_ERR ... it seems that it just gets lost. I looked through some of the ib_mthca patches in git.kernel.org/?p=linux/kernel/git/roland/infiniband.git, and incorporated the mmiowb patch for ib_mthca commands ( http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=commit;h=76d7cc0345a037e8eea426f8abc710abd22946dd). But still seeing same issue. I have a test case that repeates software-triggered migrations + rearming in a loop, and this problem usually occurs in the first few cycles, but is not too frequent. If anyone has any ideas on what might be wrong, or tips on where I can look/do to debug this, that would be very much appreciated!
For example, this is the console output I will see (printed out by our rcqp event handler): On the local end - initiates software triggered migration, using ib_modify_qp: Event IB_EVENT_PATH_MIG occurred on QP#1043 Event IB_EVENT_PATH_MIG occurred on QP#1040 Event IB_EVENT_PATH_MIG occurred on QP#1033 On the remote end: Event IB_EVENT_PATH_MIG occurred on QP#1040 Event IB_EVENT_PATH_MIG occurred on QP#1043 Thanks so much for any pointers! Lan
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
