** Changed in: linux (Ubuntu)
Status: Confirmed => Fix Released
** Changed in: linux (Ubuntu Jammy)
Status: Triaged => In Progress
** Changed in: linux (Ubuntu Jammy)
Assignee: (unassigned) => Stefan Bader (smb)
** Description changed:
+ Impact:
+ We had reports of VM setups which would show intermediate crashes and after
that locking up completely. This could be reproduced with large memory setups.
+ The problem seems to be that fixes to performance regressions caused more
problems in 5.15 kernels and the full fixes are too intrusive to be backported.
+
+ Fix:
+ The following patch was recently sent to the upstream stable mailing list and
looks to be making its way into linux-5.15.y. This changes the default value of
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed
back in config).
+
+ Regression potential:
+ VM hosts with many large memory tennants might see a performance impact which
the TDP MMU approach tried to solve. If those did not see other problems they
might turn this on again.
+
+ Testcase:
+ Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large
second level guest (32GB memory). Repeatedly starting and stopping the 2nd
level guest.
+
+
+ --- original description ---
The crash occurred on a juju machine, and the juju agent was lost.
The juju machine is on an openstack instance provision by juju.
The openstack console log indicts the it is related to spin_lock and KVM MMU:
[418200.348830] ? _raw_spin_lock+0x22/0x30
[418200.349588] _raw_write_lock+0x20/0x30
[418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
[418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
[418200.351796] direct_page_fault+0x206/0x310 [kvm]
[418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0
[418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm]
[418200.354496] try_to_migrate_one+0x691/0x730
[418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm]
openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/
syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
The syslog was rotated after the crash occurred, so the syslog at the time of
the initial crash was lost.
Other juju machine with 5.15.0.79.76 kernel seems to have the same
issues.
We previously have a similar issue with 5.15.0-73. The juju machine
crashed with raw_spin_lock and kvm mmu in the logs as well:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229
ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
Uname: Linux 5.19.0-46-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: unknown
CloudArchitecture: x86_64
CloudID: openstack
CloudName: openstack
CloudPlatform: openstack
CloudSubPlatform: metadata (http://169.254.169.254)
Date: Mon Aug 21 08:59:46 2023
Ec2AMI: ami-00000c61
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: availability-zone-1
Ec2InstanceType: builder-cpu4-ram72-disk20
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
LANG=C.UTF-8
SHELL=/bin/bash
SourcePackage: linux-signed-hwe-5.19
UpgradeStatus: No upgrade log present (probably fresh install)
- ---
+ ---
ProblemType: Bug
AlsaDevices:
- total 0
- crw-rw---- 1 root audio 116, 1 Aug 23 03:23 seq
- crw-rw---- 1 root audio 116, 33 Aug 23 03:23 timer
+ total 0
+ crw-rw---- 1 root audio 116, 1 Aug 23 03:23 seq
+ crw-rw---- 1 root audio 116, 33 Aug 23 03:23 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
CloudArchitecture: x86_64
CloudID: openstack
CloudName: openstack
CloudPlatform: openstack
CloudSubPlatform: metadata (http://169.254.169.254)
DistroRelease: Ubuntu 22.04
Ec2AMI: ami-00000fbb
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: availability-zone-2
Ec2InstanceType: builder-cpu2-ram44-disk20
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
MachineType: OpenStack Foundation OpenStack Nova
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:
-
+
ProcEnviron:
- TERM=xterm-256color
- PATH=(custom, no user)
- LANG=C.UTF-8
- SHELL=/bin/bash
+ TERM=xterm-256color
+ PATH=(custom, no user)
+ LANG=C.UTF-8
+ SHELL=/bin/bash
ProcFB: 0 qxldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-83-generic
root=UUID=a6de04b8-3631-4ce4-bb96-48076f4a56bf ro console=tty1 console=ttyS0
ProcVersionSignature: Ubuntu 5.15.0-83.92-generic 5.15.116
RelatedPackageVersions:
- linux-restricted-modules-5.15.0-83-generic N/A
- linux-backports-modules-5.15.0-83-generic N/A
- linux-firmware 20220329.git681281e4-0ubuntu3.17
+ linux-restricted-modules-5.15.0-83-generic N/A
+ linux-backports-modules-5.15.0-83-generic N/A
+ linux-firmware 20220329.git681281e4-0ubuntu3.17
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy ec2-images
Uname: Linux 5.15.0-83-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 04/01/2014
dmi.bios.release: 0.0
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.13.0-1ubuntu1.1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-4.2
dmi.modalias:
dmi:bvnSeaBIOS:bvr1.13.0-1ubuntu1.1:bd04/01/2014:br0.0:svnOpenStackFoundation:pnOpenStackNova:pvr21.2.4:cvnQEMU:ct1:cvrpc-i440fx-4.2:sku:
dmi.product.family: Virtual Machine
dmi.product.name: OpenStack Nova
dmi.product.version: 21.2.4
dmi.sys.vendor: OpenStack Foundation
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176
Title:
Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
5.19.0-46.47-22.04.1
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Jammy:
In Progress
Bug description:
Impact:
We had reports of VM setups which would show intermediate crashes and after
that locking up completely. This could be reproduced with large memory setups.
The problem seems to be that fixes to performance regressions caused more
problems in 5.15 kernels and the full fixes are too intrusive to be backported.
Fix:
The following patch was recently sent to the upstream stable mailing list and
looks to be making its way into linux-5.15.y. This changes the default value of
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed
back in config).
Regression potential:
VM hosts with many large memory tennants might see a performance impact which
the TDP MMU approach tried to solve. If those did not see other problems they
might turn this on again.
Testcase:
Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large
second level guest (32GB memory). Repeatedly starting and stopping the 2nd
level guest.
--- original description ---
The crash occurred on a juju machine, and the juju agent was lost.
The juju machine is on an openstack instance provision by juju.
The openstack console log indicts the it is related to spin_lock and KVM MMU:
[418200.348830] ? _raw_spin_lock+0x22/0x30
[418200.349588] _raw_write_lock+0x20/0x30
[418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
[418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
[418200.351796] direct_page_fault+0x206/0x310 [kvm]
[418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0
[418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm]
[418200.354496] try_to_migrate_one+0x691/0x730
[418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm]
openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/
syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
The syslog was rotated after the crash occurred, so the syslog at the time of
the initial crash was lost.
Other juju machine with 5.15.0.79.76 kernel seems to have the same
issues.
We previously have a similar issue with 5.15.0-73. The juju machine
crashed with raw_spin_lock and kvm mmu in the logs as well:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229
ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
Uname: Linux 5.19.0-46-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: unknown
CloudArchitecture: x86_64
CloudID: openstack
CloudName: openstack
CloudPlatform: openstack
CloudSubPlatform: metadata (http://169.254.169.254)
Date: Mon Aug 21 08:59:46 2023
Ec2AMI: ami-00000c61
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: availability-zone-1
Ec2InstanceType: builder-cpu4-ram72-disk20
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
LANG=C.UTF-8
SHELL=/bin/bash
SourcePackage: linux-signed-hwe-5.19
UpgradeStatus: No upgrade log present (probably fresh install)
---
ProblemType: Bug
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Aug 23 03:23 seq
crw-rw---- 1 root audio 116, 33 Aug 23 03:23 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
CloudArchitecture: x86_64
CloudID: openstack
CloudName: openstack
CloudPlatform: openstack
CloudSubPlatform: metadata (http://169.254.169.254)
DistroRelease: Ubuntu 22.04
Ec2AMI: ami-00000fbb
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: availability-zone-2
Ec2InstanceType: builder-cpu2-ram44-disk20
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
MachineType: OpenStack Foundation OpenStack Nova
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB: 0 qxldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-83-generic
root=UUID=a6de04b8-3631-4ce4-bb96-48076f4a56bf ro console=tty1 console=ttyS0
ProcVersionSignature: Ubuntu 5.15.0-83.92-generic 5.15.116
RelatedPackageVersions:
linux-restricted-modules-5.15.0-83-generic N/A
linux-backports-modules-5.15.0-83-generic N/A
linux-firmware 20220329.git681281e4-0ubuntu3.17
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy ec2-images
Uname: Linux 5.15.0-83-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 04/01/2014
dmi.bios.release: 0.0
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.13.0-1ubuntu1.1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-4.2
dmi.modalias:
dmi:bvnSeaBIOS:bvr1.13.0-1ubuntu1.1:bd04/01/2014:br0.0:svnOpenStackFoundation:pnOpenStackNova:pvr21.2.4:cvnQEMU:ct1:cvrpc-i440fx-4.2:sku:
dmi.product.family: Virtual Machine
dmi.product.name: OpenStack Nova
dmi.product.version: 21.2.4
dmi.sys.vendor: OpenStack Foundation
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2032176/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp