Re: [ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests

Kanoj Sarcar Thu, 11 Oct 2007 12:21:31 -0700

Sean Hefty wrote:

cma_process_remove() -> cma_remove_id_dev() generates the event for
device removal. This is ok to do as long as it can be guaranteed that a
racing rdma_destroy_id() has not returned back to caller, correct?


IE, the caller must be willing to accept device removal events until its
rdma_destroy_id() returns.


Correct - rdma_destroy_id() blocks until all callbacks from the rdma_cm have
completed.

If so, why is cma_remove_id_dev() trying so hard to not generate the
event when rdma_destroy_id() has gotten to the point of setting
CMA_DESTROYING? Could it not just generate the event, happy in the
knowledge that the refcount bump done by cma_process_remove() will
prevent the rdma_destroy_id() call from returning?


There are two ways for the user to destroy an rdma_cm_id.  They can either call
rdma_destroy_id() directly or return a non-zero value from a callback.  In order
to support the latter, all callbacks to a user on the same rdma_cm_id must be
serialized, and once the user has returned a non-zero value no further callbacks
can occur.  (Otherwise the user wouldn't know when it was safe to deallocate
their connection context.)

Since a device removal can occur at any point, the device removal callback must
be serialized with any other callback in progress.  It does this by marking that
the device has been removed.  This prevents any new callbacks from being
invoked, but a callback may already be in progress.  The device removal code
waits for that callback to complete.  After it completes, it needs to see if the
user wants to destroy the rdma_cm_id - meaning they returned a non-zero value
from the first callback.  If so, then the device removal callback cannot be
invoked.

One other point is that all event callbacks for a given rdma_cm_id end up being
serialized by default.  Only device removal event requires special handling,
since that thread can run at any time.  If you look at some of the callback
handlers (named *_handler), you'll see calls to disable/enable remove, which
provides this serialization.

- Sean

Ok, thanks, I see how CMA_DESTROYING is used to correctly implement thecallback initiated destruct.

I don't understand the reason for callback initiated destruct in thefirst place, but thats too off topic ...

With this new information, I will revisit the thread posted athttp://lists.openfabrics.org/pipermail/general/2007-September/040614.htmlto see if really the problem being talked about there is non existant.


Kanoj
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests

Reply via email to