On 9/6/2013 6:19 PM, Shawn Bohrer wrote:
On Thu, Sep 05, 2013 at 10:14:51AM -0500, Steve Wise wrote:
Roland, what do you think?

As I've said, I think we should go ahead with using the rtnl lock in
the core.  Is there a complete patch available for review?  looks
like the original was a partial fix.
I guess I should realize that when no one jumps at fixing my issues
for me that they probably aren't simple to fix.  The solution that
Cong proposed was to acquire rtnl_lock() before acquiring the
infiniband device_mutex, and his partial patch did that in
ib_register_client().  The problem is that you would also need to do
that in ib_unregister_client(), ib_register_device(), and
ib_unregister_device(), and that brings us back to the original
problem which was that cxgb3 was holding the rtnl_lock() when it
called ib_register_device().  Thus with the proposed fix I believe
cxgb3 would already be holding the rtnl_lock() and then call
ib_register_device() which would try to acquire the rtnl_lock() again
and deadlock for a different reason.

Actually how does this currently work?  ib_register_device() calls
client->add() for each client in the list which should call
ipoib_add_one() which calls register_netdev().  Shouldn't that also
deadlock in the cxgb3 case?

cxgb3 is an iWARP device and doesn't support IPoIB.


Also while digging through this I think I see another bug which is
that ipoib_dev_cleanup() can be called from ipoib_add_port() but in
the current code ipoib_add_port() is not holding the rtnl_lock() which
appears to be a requirement of ipoib_dev_cleanup().

Sigh...  I'm going to stop looking at this for now and hopefully
someone can propose a better solution to this issue.

I can help with this, but I'm waiting for Roland to chime in.

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to