https://bugs.kde.org/show_bug.cgi?id=519837
Bug ID: 519837
Summary: KWin freezes for ~12 seconds on GPU suspend/resume on
dual-AMDGPU systems (Failed to open drm device)
Classification: Plasma
Product: kwin
Version First 6.5.6
Reported In:
Platform: Fedora RPMs
OS: Linux
Status: REPORTED
Severity: major
Priority: NOR
Component: wayland-generic
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
SUMMARY
On dual-AMDGPU systems, changing the power state (suspend/resume) of the
dedicated GPU will cause KWin to hang for at least 10 seconds. After the hang,
it continues to function normally until the next state change. Such state
changes occur either when starting an application explicitly on the dGPU or a
few seconds after closing the last application that was using the dGPU.
This was previously reported for version 6.5.6 in the Manjaro Forums:
https://forum.manjaro.org/t/kwin-wayland-main-thread-hangs-12s-on-amd-hybrid-gpu-rog-g14-kwin-trying-to-open-d3cold-dgpu/186640
I experience this on version 6.6.4.
STEPS TO REPRODUCE
1. Obtain dual-AMDGPU system, ideally a laptop with hybrid graphics. I'm not
sure if this is reproducible on desktops with two AMDGPUs, but I can check if
need be.
2. Verify dGPU power state with `amdgpu_top`. It should say "(Suspended)" in
the device list.
3. Run an application on the dGPU: `DRI_PRIME=1 vkcube`
4. Observe the hang as well as the output in the kernel log
OBSERVED RESULT
KWin hangs for ~12 seconds and prints "Failed to open drm device" to the
Journal multiple times
EXPECTED RESULT
KWin should handle the GPU power state change gracefully without hanging.
SYSTEM INFORMATION
Operating System: Fedora Linux 43
KDE Plasma Version: 6.6.4
KDE Frameworks Version: 6.25.0
Qt Version: 6.10.3
Kernel Version: 6.19.12-200.fc43.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
Memory: 96 GB of RAM (92.1 GB usable)
Graphics Processor 1: AMD Radeon 780M Graphics
Graphics Processor 2: AMD Radeon RX 7600M XT
Manufacturer: TUXEDO
Product Name: TUXEDO Sirius 16 Gen2
Mesa Version: 25.3.6
WORKAROUND
According to the Manjaro forum post, restricting KWin to the iGPU using
`KWIN_DRM_DEVICES` works around this bug.
Create a file named `~/.config/environment.d/kwin-gpu.conf` with the following
content:
`set KWIN_DRM_DEVICES=/dev/dri/card2`
ADDITIONAL INFORMATION
After such a hang, the Journal will contain a lot of messages like these:
```
kwin_wayland[1906]: Failed to open drm device
kwin_wayland[1906]: Failed to open drm device /dev/dri/renderD129
kwin_wayland[1906]: Failed to open drm device /dev/dri/renderD128
kwin_wayland[1906]: Failed to open drm device /dev/dri/card1
kwin_wayland[1906]: Failed to open drm device /dev/dri/card1
```
The kernel log will look somewhat like this:
```
[drm] PCIE GART of 512M enabled (table at 0x00000081FEB00000).
amdgpu 0000:03:00.0: amdgpu: PSP is resuming...
amdgpu 0000:03:00.0: amdgpu: reserve 0x1300000 from 0x81fc000000 for PSP TMR
amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is
not available
amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x00000035, smu fw if
version = 0x00000040, smu fw program = 0, smu fw version = 0x00525f00 (82.95.0)
amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
amdgpu 0000:03:00.0: amdgpu: [drm] DMUB hardware initialized:
version=0x07002F00
amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
```
--
You are receiving this mail because:
You are watching all bug changes.