When destroying a cm_id from a context of a work queue and if the lap_state of this cm_id is IB_CM_LAP_SENT, we need to release the reference of this id that was taken when sending the lap message. Otherwise, if the expected apr message gets lost, it is only after a long time that the reference will be released, while during that the work handler thread is not available to process other things.
This problem was reported by Moni Shoua <[email protected]> and Amir Vadai <[email protected]> Signed-off-by: Sean Hefty <[email protected]> --- Good catch, although, I think we can simplify the fix to the patch below (completely untested). Please let me know if this solves the issue for you. drivers/infiniband/core/cm.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 1d9616b..79da42d 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -888,6 +888,8 @@ retest: NULL, 0, NULL, 0); break; case IB_CM_ESTABLISHED: + if (cm_id->lap_state == IB_CM_LAP_SENT) + ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg); spin_unlock_irq(&cm_id_priv->lock); ib_send_cm_dreq(cm_id, NULL, 0); goto retest; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
