On Thu, Sep 05, 2013 at 10:14:51AM -0500, Steve Wise wrote: > Roland, what do you think? > > As I've said, I think we should go ahead with using the rtnl lock in > the core. Is there a complete patch available for review? looks > like the original was a partial fix.
I guess I should realize that when no one jumps at fixing my issues for me that they probably aren't simple to fix. The solution that Cong proposed was to acquire rtnl_lock() before acquiring the infiniband device_mutex, and his partial patch did that in ib_register_client(). The problem is that you would also need to do that in ib_unregister_client(), ib_register_device(), and ib_unregister_device(), and that brings us back to the original problem which was that cxgb3 was holding the rtnl_lock() when it called ib_register_device(). Thus with the proposed fix I believe cxgb3 would already be holding the rtnl_lock() and then call ib_register_device() which would try to acquire the rtnl_lock() again and deadlock for a different reason. Actually how does this currently work? ib_register_device() calls client->add() for each client in the list which should call ipoib_add_one() which calls register_netdev(). Shouldn't that also deadlock in the cxgb3 case? Also while digging through this I think I see another bug which is that ipoib_dev_cleanup() can be called from ipoib_add_port() but in the current code ipoib_add_port() is not holding the rtnl_lock() which appears to be a requirement of ipoib_dev_cleanup(). Sigh... I'm going to stop looking at this for now and hopefully someone can propose a better solution to this issue. Thanks, Shawn -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
