I thought about this some more, and I realize we can just always take
the vlan_mutex after the RTNL rather than having to add another delayed
work item or anything like that.  So I queued up the patch below -- the
issue you found causes a clear lockdep warning, and this patch fixes
that for me, so I'm confident this fixes the problem.

---

IPoIB: Fix deadlock between ipoib_open() and child interface create

Fix a deadlock between child interface creation/deletion and ipoib
start/stop.  The former takes vlan_mutex, and then might take RTNL via
register_netdev()/unregister_netdev().  The latter is executed with
RTNL held, and tries to take vlan_mutex, which can lead to an AB-BA
deadlock.

Fix this by having the child interface creation/deletion code take the
RTNL first so vlan_mutex always nests inside RTNL.  We can use
register_netdevice() for child interfaces because we form the
interface name from the parent interface and hence don't need the '%'
expansion of register_netdev().

Reported-by: Yossi Etigin <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
---
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 2cf1a40..5a76a55 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -61,6 +61,7 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short 
pkey)
 
        ppriv = netdev_priv(pdev);
 
+       rtnl_lock();
        mutex_lock(&ppriv->vlan_mutex);
 
        /*
@@ -111,7 +112,7 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short 
pkey)
                goto device_init_failed;
        }
 
-       result = register_netdev(priv->dev);
+       result = register_netdevice(priv->dev);
        if (result) {
                ipoib_warn(priv, "failed to initialize; error %i", result);
                goto register_failed;
@@ -134,12 +135,13 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned 
short pkey)
        list_add_tail(&priv->list, &ppriv->child_intfs);
 
        mutex_unlock(&ppriv->vlan_mutex);
+       rtnl_unlock();
 
        return 0;
 
 sysfs_failed:
        ipoib_delete_debug_files(priv->dev);
-       unregister_netdev(priv->dev);
+       unregister_netdevice(priv->dev);
 
 register_failed:
        ipoib_dev_cleanup(priv->dev);
@@ -149,6 +151,7 @@ device_init_failed:
 
 err:
        mutex_unlock(&ppriv->vlan_mutex);
+       rtnl_unlock();
        return result;
 }
 
@@ -162,10 +165,11 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned 
short pkey)
 
        ppriv = netdev_priv(pdev);
 
+       rtnl_lock();
        mutex_lock(&ppriv->vlan_mutex);
        list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) {
                if (priv->pkey == pkey) {
-                       unregister_netdev(priv->dev);
+                       unregister_netdevice(priv->dev);
                        ipoib_dev_cleanup(priv->dev);
                        list_del(&priv->list);
                        free_netdev(priv->dev);
@@ -175,6 +179,7 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned 
short pkey)
                }
        }
        mutex_unlock(&ppriv->vlan_mutex);
+       rtnl_unlock();
 
        return ret;
 }
-- 
1.6.0.4

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to