Hello,

Here's a possible (aka easily reproducible) deadlock scenario involving cma.c's global mutex "lock" while destroying a listener.

Assume provider has done a IW_CM_EVENT_CONNECT_REQUEST upcall on behalf of listener, thus iwcm.c:cm_event_handler() will cause refcount to be bumped and iw_cm_wq to be scheduled to execute cm_work_handler().

cma.c:rdma_destroy_id() is invoked on the listener causing invocation of the call chain cma_cancel_operation():cma_cancel_listens():cma_destroy_listen():iw_destroy_cm_id() with the global "lock" held; iw_destroy_cm_id() will do wait_for_completion(), waiting for the
listener refcount to get to 0.

When iw_cm_wq gets to run, it executes cm_work_handler():process_event():cm_conn_req_handler():iw_conn_req_handler(), which tries to get the global "lock" (held as described previously) and goes to sleep. The deadlock is because iw_cm_wq needs to execute cm_work_handler():iwcm_deref_id() for things to make forward progress.

Notice that cm_conn_req_handler() tries to exit early if listener destruct has started (by checking IW_CM_STATE_LISTEN). iw_conn_req_handler() does similar checks on CMA_LISTEN. But there is a race window with the destruct path, such that the upcall path waits for the mutex which the destruct path acquires.

Appended patch fixes the problem.

Thanks.

Kanoj

--- drivers/infiniband/core/cma.c       2006-12-13 17:14:23.000000000 -0800
+++ /tmp/cma.c  2007-10-03 00:48:32.000000000 -0700
@@ -624,6 +624,7 @@
       cma_exch(id_priv, CMA_DESTROYING);

       if (id_priv->cma_dev) {
+               mutex_unlock(&lock);
switch (rdma_node_get_transport(id_priv->id.device->node_type))
{
               case RDMA_TRANSPORT_IB:
                       if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib))
@@ -636,6 +637,7 @@
               default:
                       break;
               }
+               mutex_lock(&lock);
               cma_detach_from_dev(id_priv);
       }
       list_del(&id_priv->listen_list);

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to