There's an undesirable interaction with issuing MRA requests to
increase connection timeouts and the listen backlog.

When the rdma_cm receives a connection request, it queues an MRA
with the ib_cm.  (The ib_cm will send an MRA if it receives a
duplicate REQ.)  The rdma_cm will then create a new rdma_cm_id and
give that to the user, which in this case is the rdma_user_cm.

If the listen backlog maintained in the rdma_user_cm is full,
it destroys the rdma_cm_id, which in turns destroys the ib_cm_id.
The ib_cm_id generates a REJ because the state of the ib_cm_id has
changed to MRA sent, versus REQ received.

Defer queuing the MRA until after the user of the rdma_cm has
examined the connection request.

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
---
This problem was detected while debugging an MPI application running
over uDAPL.

This patch is also available at:

        git://git.openfabrics.org/~shefty/rdma-dev.git for-roland

 drivers/infiniband/core/cma.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 0751697..98e1b38 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1100,7 +1100,6 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct 
ib_cm_event *ib_event)
                event.param.ud.private_data_len =
                                IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE - offset;
        } else {
-               ib_send_cm_mra(cm_id, CMA_CM_MRA_SETTING, NULL, 0);
                conn_id = cma_new_conn_id(&listen_id->id, ib_event);
                cma_set_req_event_data(&event, &ib_event->param.req_rcvd,
                                       ib_event->private_data, offset);
@@ -1122,8 +1121,18 @@ static int cma_req_handler(struct ib_cm_id *cm_id, 
struct ib_cm_event *ib_event)
        cm_id->cm_handler = cma_ib_handler;
 
        ret = conn_id->id.event_handler(&conn_id->id, &event);
-       if (!ret)
+       if (!ret) {
+               /*
+                * Acquire mutex to prevent user executing rdma_destroy_id()
+                * while we're accessing the cm_id.
+                */
+               mutex_lock(&lock);
+               if (cma_comp(conn_id, CMA_CONNECT) &&
+                   !cma_is_ud_ps(conn_id->id.ps))
+                       ib_send_cm_mra(cm_id, CMA_CM_MRA_SETTING, NULL, 0);
+               mutex_unlock(&lock);
                goto out;
+       }
 
        /* Destroy the CM ID by returning a non-zero value. */
        conn_id->cm_id.ib = NULL;



_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to