** Tags removed: verification-needed-jammy-linux-bluefield
** Tags added: verification-done-jammy-linux-bluefield

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/2039869

Title:
  Devlink reload hangs: fix race and lock issue

Status in linux-bluefield package in Ubuntu:
  Invalid
Status in linux-bluefield source package in Jammy:
  Fix Committed

Bug description:
  Summary:
  Machine hangs when doing devlink reload

  How to reproduce:
  Host:
  [root@bu-lab24v ~]# echo '2' > /sys/class/net/ens2f0np0/device/sriov_numvfs  

  Arm:
  root@bu-lab24v-oob:~# uname -r
  5.15.0-1027-bluefield
  root@bu-lab24v-oob:~# devlink dev eswitch set pci/0000:03:00.0 mode switchdev
  root@bu-lab24v-oob:~# devlink dev reload pci/0000:03:00.0
  *Hangs*

  Arm dmesg:
  [ 1089.747409] INFO: task devlink:8753 blocked for more than 120 seconds.
  [ 1089.760560]       Tainted: G           OE     5.15.0-1027-bluefield 
#29-Ubuntu
  [ 1089.775086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 1089.790829] task:devlink         state:D stack:    0 pid: 8753 ppid:  5090 
flags:0x00000004
  [ 1089.790838] Call trace:
  [ 1089.790840]  __switch_to+0xf8/0x150
  [ 1089.790857]  __schedule+0x2b8/0x790
  [ 1089.790865]  schedule+0x64/0x140
  [ 1089.790870]  schedule_preempt_disabled+0x18/0x24
  [ 1089.790874]  __mutex_lock.constprop.0+0x1a0/0x680
  [ 1089.790878]  __mutex_lock_slowpath+0x40/0x90
  [ 1089.790883]  mutex_lock+0x64/0x70
  [ 1089.790887]  devl_lock+0x1c/0x30
  [ 1089.790893]  mlx5_detach_device+0x58/0x190 [mlx5_core]
  [ 1089.791055]  mlx5_unload_one+0x40/0xe4 [mlx5_core]
  [ 1089.791177]  mlx5_devlink_reload_down+0x184/0x270 [mlx5_core]
  [ 1089.791318]  devlink_reload+0x214/0x290

  Fixes:
  Checking the OFED source code, we found this missing devl trap group
  also need to be backported to avoid deadlock.

  void mlx5_detach_device(struct mlx5_core_dev *dev, bool suspend)
  {
  ...
  #ifdef HAVE_DEVL_PORT_REGISTER
  #ifdef HAVE_DEVL_TRAP_GROUPS_REGISTER
          devl_assert_locked(priv_to_devlink(dev));
  #else
          devl_lock(devlink);
  #endif /* HAVE_DEVL_TRAP_GROUPS_REGISTER */
  #endif /* HAVE_DEVL_PORT_REGISTER */
          mutex_lock(&mlx5_intf_mutex);
  #ifdef HAVE_DEVL_PORT_REGISTER

  Related issue:
  #2032378 Devlink backport: fix race and lock issue

  So cherry-pick the patch below
  commit 852e85a704c2e11c050bdea286bc438aba4f4a22
  Author: Jiri Pirko <j...@resnulli.us>
  Date:   Sat Jul 16 13:02:34 2022 +0200

      net: devlink: add unlocked variants of devling_trap*() functions

      Add unlocked variants of devl_trap*() functions to be used in drivers
      called-in with devlink->lock held.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to