https://bugs.kde.org/show_bug.cgi?id=519837

            Bug ID: 519837
           Summary: KWin freezes for ~12 seconds on GPU suspend/resume on
                    dual-AMDGPU systems (Failed to open drm device)
    Classification: Plasma
           Product: kwin
      Version First 6.5.6
       Reported In:
          Platform: Fedora RPMs
                OS: Linux
            Status: REPORTED
          Severity: major
          Priority: NOR
         Component: wayland-generic
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

SUMMARY

On dual-AMDGPU systems, changing the power state (suspend/resume) of the
dedicated GPU will cause KWin to hang for at least 10 seconds. After the hang,
it continues to function normally until the next state change. Such state
changes occur either when starting an application explicitly on the dGPU or a
few seconds after closing the last application that was using the dGPU.

This was previously reported for version 6.5.6 in the Manjaro Forums:
https://forum.manjaro.org/t/kwin-wayland-main-thread-hangs-12s-on-amd-hybrid-gpu-rog-g14-kwin-trying-to-open-d3cold-dgpu/186640

I experience this on version 6.6.4.

STEPS TO REPRODUCE
1. Obtain dual-AMDGPU system, ideally a laptop with hybrid graphics. I'm not
sure if this is reproducible on desktops with two AMDGPUs, but I can check if
need be.
2. Verify dGPU power state with `amdgpu_top`. It should say "(Suspended)" in
the device list.
3. Run an application on the dGPU: `DRI_PRIME=1 vkcube`
4. Observe the hang as well as the output in the kernel log

OBSERVED RESULT

KWin hangs for ~12 seconds and prints "Failed to open drm device" to the
Journal multiple times

EXPECTED RESULT

KWin should handle the GPU power state change gracefully without hanging.

SYSTEM INFORMATION

Operating System: Fedora Linux 43
KDE Plasma Version: 6.6.4
KDE Frameworks Version: 6.25.0
Qt Version: 6.10.3
Kernel Version: 6.19.12-200.fc43.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
Memory: 96 GB of RAM (92.1 GB usable)
Graphics Processor 1: AMD Radeon 780M Graphics
Graphics Processor 2: AMD Radeon RX 7600M XT
Manufacturer: TUXEDO
Product Name: TUXEDO Sirius 16 Gen2

Mesa Version: 25.3.6

WORKAROUND

According to the Manjaro forum post, restricting KWin to the iGPU using
`KWIN_DRM_DEVICES` works around this bug.

Create a file named  `~/.config/environment.d/kwin-gpu.conf` with the following
content:
`set KWIN_DRM_DEVICES=/dev/dri/card2`

ADDITIONAL INFORMATION

After such a hang, the Journal will contain a lot of messages like these:
```
kwin_wayland[1906]: Failed to open drm device 
kwin_wayland[1906]: Failed to open drm device /dev/dri/renderD129
kwin_wayland[1906]: Failed to open drm device /dev/dri/renderD128
kwin_wayland[1906]: Failed to open drm device /dev/dri/card1
kwin_wayland[1906]: Failed to open drm device /dev/dri/card1
```

The kernel log will look somewhat like this:
```
[drm] PCIE GART of 512M enabled (table at 0x00000081FEB00000).
amdgpu 0000:03:00.0: amdgpu: PSP is resuming...
amdgpu 0000:03:00.0: amdgpu: reserve 0x1300000 from 0x81fc000000 for PSP TMR
amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is
not available
amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x00000035, smu fw if
version = 0x00000040, smu fw program = 0, smu fw version = 0x00525f00 (82.95.0)
amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
amdgpu 0000:03:00.0: amdgpu: [drm] DMUB hardware initialized:
version=0x07002F00
amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
```

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to