This bug is missing log files that will aid in diagnosing the problem.
While running an Ubuntu kernel (not a mainline or third-party kernel)
please enter the following command in a terminal window:
apport-collect 1961968
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.
** Changed in: linux (Ubuntu)
Status: New => Incomplete
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1961968
Title:
Broken network on some AWS instances with focal/impish kernels
Status in linux package in Ubuntu:
Incomplete
Status in linux source package in Focal:
In Progress
Status in linux source package in Impish:
In Progress
Bug description:
[Impact]
With the latest focal/linux (5.4.0-101.114) and impish/linux (5.13.0-31.34)
kernels built for SRU cycle 2022.02.21 some AWS instances fail to boot. This
impacts mostly the instance types: c4.large, c3.xlarge and x1e.xlarge. However,
not all instances deployed on those types will fail. This is affecting mostly
c4.large which fails about 80-90% of all deployments.
This was traced to be caused by the network interface failing to come
up. The following console log snippets from 5.4.0-101-generic on a
c4.large show some hints of what's going on:
[...]
[ 3.990368] unchecked MSR access error: RDMSR from 0xc90 at rIP:
0xffffffff8ea733c8 (native_read_msr+0x8/0x40)
[ 3.998463] Call Trace:
[ 4.001164] ? set_rdt_options+0x91/0x91
[ 4.004864] resctrl_late_init+0x592/0x63c
[ 4.008711] ? set_rdt_options+0x91/0x91
[ 4.012452] do_one_initcall+0x4a/0x200
[ 4.016115] kernel_init_freeable+0x1c0/0x263
[ 4.020402] ? rest_init+0xb0/0xb0
[ 4.024889] kernel_init+0xe/0x110
[ 4.029245] ret_from_fork+0x35/0x40
[...]
[ 7.718268] ena: The ena device sent a completion but the driver didn't
receive a MSI-X interrupt (cmd 8), autopolling mode is OFF
[ 7.727036] ena: Failed to submit get_feature command 12 error: -62
[ 7.731691] ena 0000:00:03.0: Cannot init indirect table
[ 7.735636] ena 0000:00:03.0: Cannot init RSS rc: -62
[ 7.740700] ena: probe of 0000:00:03.0 failed with error -62
[...]
[Fix]
Reverting the following upstream stable commit fixes the issue:
83dbf898a2d4 PCI/MSI: Mask MSI-X vectors only on success
[Test Case]
Boot an affected AWS instance type with focal/linux (5.4.0-101.114) and
impish/linux (5.13.0-31.34) kernels with the mentioned patch reverted. Then
boot with the original kernels. It should boot successfully with the reverted
patch but fail with the original kernels.
[Regression Potential]
The patch description mentions fixing a MSI-X issue with a Marvell NVME
device, which doesn't seem to be following the PCI-E specification. Reverting
this commit will keep the issue on systems with that particular NVME device
unfixed.
As of now there is no follow-up fix for this commit upstream, we might need
to keep an eye on any change and re-apply it in case a fix is found.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1961968/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp