Roland Dreier wrote:
I'll leave it to Sean and others who know the cma locking better than
I do to comment on the patch in detail, but a few notes:
- your patch is completely whitespace mangled so it would have to be
applied by hand. please look into configuring your mail client so
that it can send patches without corrupting them.
- patches should be generated so they apply with 'patch -p1', so
rather than what you have:
> --- drivers/infiniband/core/cma.c 2006-12-13 17:14:23.000000000 -0800
> +++ /tmp/cma.c 2007-10-03 00:48:32.000000000 -0700
the paths should be more like
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
Sorry, its been a while since I posted a patch. I am attaching a copy of
it (not sure if patch attachments are ok), generated with "diff -Naurp".
- I wonder which tree you generated your patch against: it seems to
be modifying cma_destroy_listen(), but you have the header:
> @@ -624,6 +624,7 @@
and cma_destroy_listen() is nowhere near line 624 in my tree. (And
it would be nice to use the '-p' option of diff to put the function
name there for easier reviewing)
This was against 2.6.20.
- Your comment doesn't make it clear to me that dropping and
reacquiring the lock is safe; can you explain why nothing else
could come along while the lock is dropped and mess things up?
It seems rdma_destroy_id() has the same pattern, but it's not clear
to me in the code:
mutex_lock(&lock);
if (id_priv->cma_dev) {
mutex_unlock(&lock);
// why can't the device be hot-unplugged here??
switch (rdma_node_get_transport(id->device->node_type)) {
what guarantees that the device does not disappear before it is
dereferenced in the switch statement. This would be a separate bug
but we probably shouldn't introduce another instance of it
(assuming I'm correct).
- R.
Yes, rdma_destroy_id() has the same thing, where the lock is dropped;
that was the inspiration for this fix too. I believe that by setting the
cmid state to CMA_DESTROYING and still keeping it on the device's list,
it should be ok to drop the lock and have a racing cma_process_remove()
silently ignore this cmid.
Also notice that the destroying thread has to do a cma_detach_from_dev()
to dec the refcount on the device before the device structure can be
freed up.
By no means do I understand all the intricacies of the cma code,
hopefully Sean/others will review and comment.
Thanks.
Kanoj
--- a/drivers/infiniband/core/cma.c 2007-10-03 11:24:12.000000000 -0700
+++ b/drivers/infiniband/core/cma.c 2007-10-03 11:27:16.000000000 -0700
@@ -624,6 +624,7 @@ static void cma_destroy_listen(struct rd
cma_exch(id_priv, CMA_DESTROYING);
if (id_priv->cma_dev) {
+ mutex_unlock(&lock);
switch (rdma_node_get_transport(id_priv->id.device->node_type))
{
case RDMA_TRANSPORT_IB:
if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib))
@@ -636,6 +637,7 @@ static void cma_destroy_listen(struct rd
default:
break;
}
+ mutex_lock(&lock);
cma_detach_from_dev(id_priv);
}
list_del(&id_priv->listen_list);
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general