Roland Dreier wrote:

I'll leave it to Sean and others who know the cma locking better than
I do to comment on the patch in detail, but a few notes:

- your patch is completely whitespace mangled so it would have to be
  applied by hand.  please look into configuring your mail client so
  that it can send patches without corrupting them.

- patches should be generated so they apply with 'patch -p1', so
  rather than what you have:

> --- drivers/infiniband/core/cma.c       2006-12-13 17:14:23.000000000 -0800
> +++ /tmp/cma.c  2007-10-03 00:48:32.000000000 -0700

  the paths should be more like

   --- a/drivers/infiniband/core/cma.c
   +++ b/drivers/infiniband/core/cma.c

Sorry, its been a while since I posted a patch. I am attaching a copy of it (not sure if patch attachments are ok), generated with "diff -Naurp".

- I wonder which tree you generated your patch against: it seems to
  be modifying cma_destroy_listen(), but you have the header:

> @@ -624,6 +624,7 @@

  and cma_destroy_listen() is nowhere near line 624 in my tree.  (And
  it would be nice to use the '-p' option of diff to put the function
  name there for easier reviewing)
This was against 2.6.20.

- Your comment doesn't make it clear to me that dropping and
  reacquiring the lock is safe; can you explain why nothing else
  could come along while the lock is dropped and mess things up?

  It seems rdma_destroy_id() has the same pattern, but it's not clear
  to me in the code:

        mutex_lock(&lock);
        if (id_priv->cma_dev) {
                mutex_unlock(&lock);
                // why can't the device be hot-unplugged here??
                switch (rdma_node_get_transport(id->device->node_type)) {

  what guarantees that the device does not disappear before it is
  dereferenced in the switch statement.  This would be a separate bug
  but we probably shouldn't introduce another instance of it
  (assuming I'm correct).

- R.

Yes, rdma_destroy_id() has the same thing, where the lock is dropped; that was the inspiration for this fix too. I believe that by setting the cmid state to CMA_DESTROYING and still keeping it on the device's list, it should be ok to drop the lock and have a racing cma_process_remove() silently ignore this cmid.

Also notice that the destroying thread has to do a cma_detach_from_dev() to dec the refcount on the device before the device structure can be freed up.

By no means do I understand all the intricacies of the cma code, hopefully Sean/others will review and comment.

Thanks.

Kanoj


--- a/drivers/infiniband/core/cma.c     2007-10-03 11:24:12.000000000 -0700
+++ b/drivers/infiniband/core/cma.c     2007-10-03 11:27:16.000000000 -0700
@@ -624,6 +624,7 @@ static void cma_destroy_listen(struct rd
        cma_exch(id_priv, CMA_DESTROYING);
 
        if (id_priv->cma_dev) {
+               mutex_unlock(&lock);
                switch (rdma_node_get_transport(id_priv->id.device->node_type)) 
{
                case RDMA_TRANSPORT_IB:
                        if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib))
@@ -636,6 +637,7 @@ static void cma_destroy_listen(struct rd
                default:
                        break;
                }
+               mutex_lock(&lock);
                cma_detach_from_dev(id_priv);
        }
        list_del(&id_priv->listen_list);
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to