Hi.

lbt wrote:
Hi,

I'm trying out APM with OFED 1.2 , using Mellanox dual-port HCA (ib_mthca driver). When I have several RCQP's that I am trying to migrate (software triggered migration using ib_modify_qp), I've noticed that sometimes 1 or 2 of the remote QP's never generate an IB_EVENT_PATH_MIG or even an IB_EVENT_PATH_MIG_ERR ... it seems that it just gets lost. I looked through some of the ib_mthca patches in git.kernel.org/?p=linux/kernel/git/roland/infiniband.git <http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git>, and incorporated the mmiowb patch for ib_mthca commands (http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=commit;h=76d7cc0345a037e8eea426f8abc710abd22946dd <http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=commit;h=76d7cc0345a037e8eea426f8abc710abd22946dd>). But still seeing same issue. I have a test case that repeates software-triggered migrations + rearming in a loop, and this problem usually occurs in the first few cycles, but is not too frequent. If anyone has any ideas on what might be wrong, or tips on where I can look/do to debug this, that would be very much appreciated!

For example, this is the console output I will see (printed out by our rcqp event handler): On the local end - initiates software triggered migration, using ib_modify_qp:
Event IB_EVENT_PATH_MIG occurred on QP#1043
Event IB_EVENT_PATH_MIG occurred on QP#1040
Event IB_EVENT_PATH_MIG occurred on QP#1033

On the remote end:
Event IB_EVENT_PATH_MIG occurred on QP#1040
Event IB_EVENT_PATH_MIG occurred on QP#1043
Is
the timeout value (in the QP attributes) is 0?
If the answer is no, can you please supply some more details on this?


thanks
Dotan
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to