Re: [ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests

Kanoj Sarcar Wed, 10 Oct 2007 16:17:14 -0700

Sean Hefty wrote:

* in your patch, I suggest taking out the warning printk from
cma_listen_on_dev() when the listener create attempt fails; it might be
that the device is out of resources etc. Since the code takes care of
this situation pretty well, I don't see a need for the printk.


That's easy enough to do.

* I don't see a reason for the internal_id and the device listeners
getting a refcount on the wildcard listener. Because, even without
these, it is guaranteed that the wildcard listener will exist at least
as long as any of the children device listener's are around, by looking
at the logic in rdma_destroy_id(). Can you provide some logic for
requring this then?


There are 2 ways to destroy an internal_id: destroying its parent (the wildcard
listen) or removing its device.  When a device is removed, the internal_id is
removed from its parent list to ensure that it is only destroyed once.  If the
parent were to be destroyed at this point, it would destroy any remaining
children, then be freed.  The internal_id still exists however, and could be
generating connection request events, which expects to fine the parent.  The
reference ensures that the parent stays around as long as any children remain.


Ok, makes sense.

* not that I am very worried (and I suggesting resolving this thru
another subsequent patch if it is really a problem), but I think device
removal is still racy wrt non wildcard listeners. Here's the sequence:
cma_process_remove()->cma_remove_id_dev() decides it will
rdma_destroy_id() the listener id, and at the same time a process
context rdma_destroy_id() decides it is going to do the same. There are
probably various ways to take care of this, the simple one might be for
rdma_destroy_id() to look at the "state" and make a decision about who
gets to destroy.


A user cannot both return non-zero from their callback (indicating that the
rdma_cm should destroy the id) and call rdma_destroy_id() on the same id.  This
is equivalent to call rdma_destroy_id() twice.  It's not too difficult for the
user to avoid this.

- Sean

I don't understand your response. ucma.c for example can callrdma_create_id() and rdma_destroy_id(), correct? What says that whenucma.c does a rdma_destroy_id() on a nonwildcard listener, a deviceremoval is not attempting to do the same on the listener? If this ispossible, the code paths I mentioned above can still trigger a doubledestruct on a listener, correct?


Kanoj
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests

Reply via email to