During a review of the SRU request, it was found that the backport of
commit 4cbe4dac82e423 did not remove the definition of
MLX4_INTERFACE_STATE_SHUTDOWN.  However, MLX4_INTERFACE_STATE_NOWAIT was
added by the backport with the same value as
MLX4_INTERFACE_STATE_SHUTDOWN.

What was needed is commit b4353708f5a as a prereq.  This commit removes
MLX4_INTERFACE_STATE_SHUTDOWN and all references to it.

I built a v2 Yakkety test kernel with these two commits, which can be 
downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1672785/yakkety/

If this test kernel fixes the bug, I will re-submit the SRU request.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672785

Title:
  [Hyper-V][Mellanox] net/mlx4_core: Avoid delays during VF driver
  device shutdown

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  Fix Committed

Bug description:
  Mellanox has submitted the following patch upstream that's important
  for SR-IOV in Azure.

  Please integrate it into the Mellanox mlx4 drivers for lts-xenial,
  HWE, Zesty, and Azure custom.

  https://patchwork.ozlabs.org/patch/738305/

  From: Jack Morgenstein <[email protected]>

  Some Hypervisors detach VFs from VMs by instantly causing an FLR event
  to be generated for a VF.

  In the mlx4 case, this will cause that VF's comm channel to be disabled
  before the VM has an opportunity to invoke the VF device's "shutdown"
  method.

  For such Hypervisors, there is a race condition between the VF's
  shutdown method and its internal-error detection/reset thread.

  The internal-error detection/reset thread (which runs every 5 seconds) also
  detects a disabled comm channel. If the internal-error detection/reset
  flow wins the race, we still get delays (while that flow tries repeatedly
  to detect comm-channel recovery).

  The cited commit fixed the command timeout problem when the
  internal-error detection/reset flow loses the race.

  This commit avoids the unneeded delays when the internal-error
  detection/reset flow wins.

  Fixes: d585df1c5ccf ("net/mlx4_core: Avoid command timeouts during VF driver 
device shutdown")
  Signed-off-by: Jack Morgenstein <[email protected]>
  Reported-by: Simon Xiao <[email protected]>
  Signed-off-by: Tariq Toukan <[email protected]>
  ---
   drivers/net/ethernet/mellanox/mlx4/cmd.c  | 11 +++++++++++
   drivers/net/ethernet/mellanox/mlx4/main.c | 11 +++++++++++
   include/linux/mlx4/device.h               |  1 +
   3 files changed, 23 insertions(+)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672785/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to