Three driver callbacks schedule a reset and wait for its completion:
ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().

Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
commit c2ed2403f12c74 ("iavf: Wait for reset in callbacks which trigger
it") to fix a race condition where adding an interface to bonding
immediately after MTU or ring parameter change failed because the
interface was still in __RESETTING state. The same commit also added
waiting in iavf_set_priv_flags(), which was later removed by commit
53844673d55529 ("iavf: kill "legacy-rx" for good").

Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
("iavf: Fix return of set the new channel count") to ensure the PF has
enough time to complete the VF reset when changing channel count, and to
return correct error codes to userspace.

Commit ef490bbb226702 ("iavf: Add net_shaper_ops support") added
net_shaper_ops to iavf, which required reset_task to use _locked NAPI
variants (napi_enable_locked, napi_disable_locked) that need the netdev
instance lock.

Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
rtnetlink operations") and commit 2bcf4772e45adb ("net: ethtool: try to
protect all callback with netdev instance lock") started holding the
netdev instance lock during ndo and ethtool callbacks for drivers with
net_shaper_ops.

The combination of waiting for reset and the new locking requirements
creates a deadlock: the callback holds the lock and waits for reset_task,
but reset_task is blocked waiting for the same lock:

  Thread 1 (callback)               Thread 2 (reset_task)
  -------------------               ---------------------
  netdev_lock()
  ndo_change_mtu() or ethtool op
    iavf_schedule_reset()
    iavf_wait_for_reset()           iavf_reset_task()
      waiting...                      netdev_lock() <- DEADLOCK

Reproducer:

  # echo 16 > /sys/class/net/$PF/device/sriov_numvfs
  # ip link set $VF up
  # ip link set $VF mtu 5000
  RTNETLINK answers: Device or resource busy

  # dmesg | tail -1
  iavf: MTU change interrupted waiting for reset

Fix this by temporarily releasing the lock while waiting for reset to
complete. This follows the pattern used elsewhere in the kernel (e.g.,
do_set_master() releases rtnl_lock before calling ndo_add_slave()).

Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink 
operations")
Signed-off-by: Petr Oros <[email protected]>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c 
b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 8aa6e92c16431f..d7738fb8fa60bc 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -189,13 +189,22 @@ static bool iavf_is_reset_in_progress(struct iavf_adapter 
*adapter)
  * iavf_wait_for_reset - Wait for reset to finish.
  * @adapter: board private structure
  *
+ * The iavf driver selects NET_SHAPER, so callbacks that trigger reset are
+ * always called with netdev instance lock held, while reset_task also needs
+ * this lock. Release the lock while waiting to avoid deadlock.
+ *
  * Returns 0 if reset finished successfully, negative on timeout or interrupt.
  */
 int iavf_wait_for_reset(struct iavf_adapter *adapter)
 {
-       int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
-                                       !iavf_is_reset_in_progress(adapter),
-                                       msecs_to_jiffies(5000));
+       struct net_device *netdev = adapter->netdev;
+       int ret;
+
+       netdev_unlock(netdev);
+       ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
+                                              
!iavf_is_reset_in_progress(adapter),
+                                              msecs_to_jiffies(5000));
+       netdev_lock(netdev);
 
        /* If ret < 0 then it means wait was interrupted.
         * If ret == 0 then it means we got a timeout while waiting
-- 
2.52.0

Reply via email to