For Groovy, the proposed fix has already been applied to the generic
groovy/linux kernel as part of "Groovy update: v5.8.17 upstream stable
release" (bug 1902137). Therefore, the patch applied to the linux-azure
branch went away during the rebase so it's missing the BugLink to this
bug report, due to that this bug will not be closed automatically when
the package is released.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1894893

Title:
  [linux-azure][hibernation] GPU device no longer working after resume
  from hibernation in NV6 VM size

Status in linux-azure package in Ubuntu:
  Invalid
Status in linux-azure source package in Focal:
  Fix Committed
Status in linux-azure source package in Groovy:
  Fix Committed

Bug description:
  [Impact]

  There are failed logs after resume from hibernation in NV6 (GPU passthrough 
size) VM in Azure:
  [ 1432.153730] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() 
failed: 0x5
  [ 1432.167910] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() 
failed: 0x5

  This happens to the latest stable release of the linux-azure
  5.4.0-1023.23 kernel and the latest mainline linux kernel.

  [Test Case]

  How reproducible:
  100%

  Steps to Reproduce:
  1. Start a Standard_NV6 VM in Azure and enable hibernation properly (please 
refer to 
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/14 )

  E.g. here I create a Generation-1 Ubuntu 20.04 Standard NV6_Promo (6
  vcpus, 56 GiB memory) VM in East US 2.

  2. Make sure the in-kernel open-source nouveau driver is loaded, or
  blacklist the nouveau driver and install the official Nvidia GPU
  driver (please follow https://docs.microsoft.com/en-us/azure/virtual-
  machines/linux/n-series-driver-setup : "Install GRID drivers on NV or
  NVv3-series VMs" -- the most important step to run the "./NVIDIA-
  Linux-x86_64-grid.run".)

  3. Run hibernation from serial console
  # systemctl hibernate

  4. After hibernation finishes, start VM and check dmesg
  # dmesg|grep fail

  Actual results:
  [ 1432.153730] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() 
failed: 0x5
  [ 1432.167910] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() 
failed: 0x5

  And /proc/interrupts shows that the GPU interrupts are no longer
  happening.

  Expected results:
  No failed logs, and the GPU interrupt should still happen after hibernation.

  [Regression Potential]

  The fix touches the pci-hyperv and can compromise the hyper-v guest
  drivers. However the change is focuses on the execution path used for
  hibernation that is still not officially supported.

  [Other info]

  BUG FIX:
  I made a fix here: https://lkml.org/lkml/2020/9/4/1268.

  Without the patch, we see the error "hv_pci
  47505500-0001-0000-3130-444531334632: hv_irq_unmask() failed: 0x5"
  during hibernation when the VM has the Nvidia GPU driver loaded, and
  after hibernation the GPU driver can no longer receive any MSI/MSI-X
  interrupts when we check /proc/interrupts.

  With the patch, we should no longer see the error, and the GPU driver
  should still receive interrupts after hibernation.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to