This bug is missing log files that will aid in diagnosing the problem.
While running an Ubuntu kernel (not a mainline or third-party kernel)
please enter the following command in a terminal window:

apport-collect 1961968

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1961968

Title:
  Broken network on some AWS instances with focal/impish kernels

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  In Progress

Bug description:
  [Impact]
  With the latest focal/linux (5.4.0-101.114) and impish/linux (5.13.0-31.34) 
kernels built for SRU cycle 2022.02.21 some AWS instances fail to boot. This 
impacts mostly the instance types: c4.large, c3.xlarge and x1e.xlarge. However, 
not all instances deployed on those types will fail. This is affecting mostly 
c4.large which fails about 80-90% of all deployments.

  This was traced to be caused by the network interface failing to come
  up. The following console log snippets from 5.4.0-101-generic on a
  c4.large show some hints of what's going on:

  [...]
  [    3.990368] unchecked MSR access error: RDMSR from 0xc90 at rIP: 
0xffffffff8ea733c8 (native_read_msr+0x8/0x40)
  [    3.998463] Call Trace:
  [    4.001164]  ? set_rdt_options+0x91/0x91
  [    4.004864]  resctrl_late_init+0x592/0x63c
  [    4.008711]  ? set_rdt_options+0x91/0x91
  [    4.012452]  do_one_initcall+0x4a/0x200
  [    4.016115]  kernel_init_freeable+0x1c0/0x263
  [    4.020402]  ? rest_init+0xb0/0xb0
  [    4.024889]  kernel_init+0xe/0x110
  [    4.029245]  ret_from_fork+0x35/0x40
  [...]
  [    7.718268] ena: The ena device sent a completion but the driver didn't 
receive a MSI-X interrupt (cmd 8), autopolling mode is OFF
  [    7.727036] ena: Failed to submit get_feature command 12 error: -62
  [    7.731691] ena 0000:00:03.0: Cannot init indirect table
  [    7.735636] ena 0000:00:03.0: Cannot init RSS rc: -62
  [    7.740700] ena: probe of 0000:00:03.0 failed with error -62
  [...]

  [Fix]
  Reverting the following upstream stable commit fixes the issue:

  83dbf898a2d4 PCI/MSI: Mask MSI-X vectors only on success

  [Test Case]
  Boot an affected AWS instance type with focal/linux (5.4.0-101.114) and 
impish/linux (5.13.0-31.34) kernels with the mentioned patch reverted. Then 
boot with the original kernels. It should boot successfully with the reverted 
patch but fail with the original kernels.

  [Regression Potential]
  The patch description mentions fixing a MSI-X issue with a Marvell NVME 
device, which doesn't seem to be following the PCI-E specification. Reverting 
this commit will keep the issue on systems with that particular NVME device 
unfixed.
  As of now there is no follow-up fix for this commit upstream, we might need 
to keep an eye on any change and re-apply it in case a fix is found.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1961968/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to