When destroying a cm_id from a context of a work queue and if the lap_state of
this cm_id is IB_CM_LAP_SENT, we need to release the reference of this id that
was taken when sending the lap message.  Otherwise, if the expected apr
message gets lost, it is only after a long time that the reference will be
released, while during that the work handler thread is not available to process
other things.

This problem was reported by Moni Shoua <[email protected]> and
Amir Vadai <[email protected]>

Signed-off-by: Sean Hefty <[email protected]>
---
Good catch, although, I think we can simplify the fix to the patch below
(completely untested).  Please let me know if this solves the issue for you.

 drivers/infiniband/core/cm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 1d9616b..79da42d 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -888,6 +888,8 @@ retest:
                               NULL, 0, NULL, 0);
                break;
        case IB_CM_ESTABLISHED:
+               if (cm_id->lap_state == IB_CM_LAP_SENT)
+                       ib_cancel_mad(cm_id_priv->av.port->mad_agent, 
cm_id_priv->msg);
                spin_unlock_irq(&cm_id_priv->lock);
                ib_send_cm_dreq(cm_id, NULL, 0);
                goto retest;


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to