Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: linux (Ubuntu Jammy)
Status: New => Confirmed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2065153
Title:
[qxl] Ubuntu 24.04 VM guest console freezes after some hours
Status in linux package in Ubuntu:
Confirmed
Status in linux source package in Jammy:
Confirmed
Status in linux source package in Noble:
Confirmed
Bug description:
Thank you @dreibh for reporting the original description and reporting
the bug!
[ Impact ]
* The qxl driver currently has a bug that causes console freezes on qxl
paravirtualized GPUs. This issue does not cause a full system hang since the
system is still accessible via other means such as SSH, but it does cause the
virtual console output to hang. The following dmesg output is seen when the
issue occurs:
[ 280.618452] [TTM] Buffer eviction failed
[ 280.618463] qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001)
[ 280.618466] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate
VRAM BO
* The issue was caused by commit: (5a838e5d5825 "drm/qxl: simplify
qxl_fence_wait") Which does not add any new code but tries to simplify the
already existing function.
This commit due to the problems it has caused, has been reverted upstream
with: 07ed11afb68d Revert ("drm/qxl: simplify qxl_fence_wait"). The commit also
adds back the DMA_FENCE_WARN macro due to it's usage in the reverted functions.
The macro was originally removed with: d72277b6c37d ("dma-buf: nuke
DMA_FENCE_TRACE macros v2").
[ Test Plan ]
To Reproduce the bug follow the below steps:
1. Install a Ubuntu version with an affected kernel in a VM and make
sure that the QXL video driver is in use instead of virtio. The server
edition is enough for the reproducer no need for a DE to be installed.
The issue is reproducible on Jammy 5.15 and above except Plucky since
the fix is included in kernel 6.14.
2. Create a script and make it executable with the following content:
```
#!/bin/bash
chvt 3
for j in $(seq 80); do
echo "$(date) starting round $j"
if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ];
then
echo "bug was reproduced after $j tries"
exit 1
fi
for i in $(seq 100); do
dmesg > /dev/tty3
done
done
echo "bug could not be reproduced"
exit 0
```
3. Execute the script from the virtual console and from an SSH session,
monitor the dmesg logs until you see the following:
[ 280.618452] [TTM] Buffer eviction failed
[ 280.618463] qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001)
[ 280.618466] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate
VRAM BO
[ Where problems could occur ]
* Virtual displays might still freeze or hang
* Warning messages related to the qxl driver might occur.
[ Other Info]
* The patch does cause a warning message to show up on boot when using
the qxl video driver. The warning itself is harmless and does not seem
to have any negative effects in my testing:
[ 5.011445] WARNING: CPU: 15 PID: 822 at kernel/workqueue.c:2985
check_flush_dependency.part.0+0xde/0x140
[ 5.011449] Modules linked in: qrtr cfg80211 binfmt_misc intel_rapl_msr
intel_rapl_common intel_uncore_frequency_common intel_pmc_core intel_vsec
pmt_telemetry pmt_class kvm_intel kvm snd_hda_codec_generic irqbypass
snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi rapl snd_hda_codec
snd_hda_core snd_hwdep snd_pcm joydev snd_timer snd qxl i2c_i801 soundcore
drm_ttm_helper i2c_smbus lpc_ich ttm input_leds mac_hid serio_raw sch_fq_codel
dm_multipath msr efi_pstore nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables
autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic
usbhid hid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic
ghash_clmulni_intel sha256_ssse3 ahci sha1_ssse3 libahci psmouse virtio_rng
xhci_pci xhci_pci_renesas aesni_intel crypto_simd cryptd
[ 5.011493] CPU: 15 PID: 822 Comm: kworker/u65:1 Not tainted
6.8.0-999-generic #70
[ 5.011495] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-debian-1.16.3-2 04/01/2014
[ 5.011496] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 5.011501] RIP: 0010:check_flush_dependency.part.0+0xde/0x140
[ 5.011502] Code: 24 18 4d 89 f0 49 8d 8d b0 00 00 00 48 c7 c7 e0 8f e6 8a
c6 05 f3 90 8c 02 01 48 8b 70 08 48 81 c6 b0 00 00 00 e8 a2 5e fd ff <0f> 0b eb
91 0f b6 1d d9 90 8c 02 80 fb 01 0f 87 38 57 0a 01 83 e3
[ 5.011503] RSP: 0018:ffffbd85c0ce7c28 EFLAGS: 00010046
[ 5.011505] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000000
[ 5.011506] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[ 5.011506] RBP: ffffbd85c0ce7c48 R08: 0000000000000000 R09:
0000000000000000
[ 5.011507] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff9f308158a540
[ 5.011508] R13: ffff9f30801cea00 R14: ffffffffc0946570 R15:
0000000000000000
[ 5.011509] FS: 0000000000000000(0000) GS:ffff9f31f7d80000(0000)
knlGS:0000000000000000
[ 5.011510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.011510] CR2: 000000c000a02000 CR3: 0000000108cf8000 CR4:
0000000000750ef0
[ 5.011514] PKRU: 55555554
[ 5.011514] Call Trace:
[ 5.011516] <TASK>
[ 5.011518] ? show_regs+0x6d/0x80
[ 5.011521] ? __warn+0x89/0x160
[ 5.011523] ? check_flush_dependency.part.0+0xde/0x140
[ 5.011524] ? report_bug+0x17e/0x1b0
[ 5.011527] ? handle_bug+0x6e/0xb0
[ 5.011529] ? exc_invalid_op+0x18/0x80
[ 5.011532] ? asm_exc_invalid_op+0x1b/0x20
[ 5.011535] ? __pfx_qxl_gc_work+0x10/0x10 [qxl]
[ 5.011539] ? check_flush_dependency.part.0+0xde/0x140
[ 5.011540] ? check_flush_dependency.part.0+0xde/0x140
[ 5.011541] start_flush_work+0xba/0x340
[ 5.011543] flush_work+0x5f/0xb0
[ 5.011545] qxl_queue_garbage_collect+0x8c/0x90 [qxl]
[ 5.011548] qxl_fence_wait+0xa3/0x1b0 [qxl]
[ 5.011552] dma_fence_wait_timeout+0x64/0x140
[ 5.011555] dma_resv_wait_timeout+0x7f/0xf0
[ 5.011556] ttm_bo_delayed_delete+0x2a/0xc0 [ttm]
[ 5.011560] process_one_work+0x181/0x3a0
[ 5.011562] worker_thread+0x306/0x440
[ 5.011563] ? __pfx_worker_thread+0x10/0x10
[ 5.011565] kthread+0xef/0x120
[ 5.011569] ? __pfx_kthread+0x10/0x10
[ 5.011572] ret_from_fork+0x44/0x70
[ 5.011574] ? __pfx_kthread+0x10/0x10
[ 5.011578] ret_from_fork_asm+0x1b/0x30
[ 5.011581] </TASK>
[ 5.011582] ---[ end trace 0000000000000000 ]---
* The Jammy version of the patch (5.15) does not need the re-
introduction of the DMA_FENCE_WARN macro since it already exist.
[Original Description]
I made simple Ubuntu 24.04 LTS Server installations as guests in an
up-to-date Proxmox. No Xorg/Wayland, just CLI! The virtual graphics card is
qml, 16 MiB memory (standard settings). Opening the console in the Proxmox GUI,
or via remote-viewer initially is fine. However, after some time (usually:
hours), the console just locks up. However, SSH into the guest machine remains
fine.
Ubuntu 22.04, or 20.04 are fine, the issue only occurs with the new
Ubuntu 24.04. The issue is reproducible with all Ubuntu 24.04 VMs. A
reboot the the VM makes the console usable again, until the issue
occurs again (usually after some hours).
Unusual observation from dmesg:
...
[522890.748557] [TTM] Buffer eviction failed
[522890.748981] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001)
[522890.749336] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate
VRAM BO
[522906.108616] [TTM] Buffer eviction failed
[522906.109045] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001)
[522906.109386] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate
VRAM BO
[522921.468729] [TTM] Buffer eviction failed
[522921.469154] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001)
[522921.469512] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate
VRAM BO
[522936.828783] [TTM] Buffer eviction failed
[522936.829207] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001)
[522936.829630] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate
VRAM BO
...
nornetpp@hansa:~$ uname -a
Linux hansa.management.crnalab.net 6.8.0-31-generic #31-Ubuntu SMP
PREEMPT_DYNAMIC Sat Apr 20 00:40:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
nornetpp@hansa:~$ lsmod | grep qxl
qxl 86016 0
drm_ttm_helper 12288 1 qxl
ttm 110592 2 qxl,drm_ttm_helper
ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: xorg (not installed)
ProcVersionSignature: Ubuntu 6.8.0-31.31-generic 6.8.1
Uname: Linux 6.8.0-31-generic x86_64
ApportVersion: 2.28.1-0ubuntu2
Architecture: amd64
CasperMD5CheckResult: pass
Date: Wed May 8 11:05:07 2024
InstallationDate: Installed on 2024-03-12 (57 days ago)
InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Daily amd64
(20240312)
ProcEnviron:
LANG=en_IE.UTF-8
LANGUAGE=nb:de:en_US
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-256color
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2065153/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp