https://bugs.kde.org/show_bug.cgi?id=521164

            Bug ID: 521164
           Summary: KWin Wayland freezes the whole system at suspend when
                    the NVIDIA GPU's DRM devices get removed
    Classification: Plasma
           Product: kwin
      Version First unspecified
       Reported In:
          Platform: Other
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: wayland-generic
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

Created attachment 192963
  --> https://bugs.kde.org/attachment.cgi?id=192963&action=edit
Kwin-log

I have a laptop with an NVIDIA GPU (RTX 5080) running PikaOS, and when I
suspend the machine, roughly one time out of five it never wakes back up (seems
to be exclusively after a gaming session) — completely black screen, dead
keyboard, I have to force a power-off.
Looking at the logs, I noticed that every time it fails, KWin throws these
errors right when the nvidia-suspend service is running:
kwin_core: Failed to open /dev/dri/renderD128 device (No such device)
kwin_core: Failed to authenticate the drm magic token. path: "/dev/dri/card1"
error: Permission denied
kwin_wayland_drm: Atomic modeset test failed! Permission denied
kwin_core: Applying output configuration failed!
>From what I understand: the NVIDIA driver is shutting down the GPU for suspend,
and at the same moment KWin is still trying to access the DRM devices that just
disappeared. Instead of handling this gracefully, the whole system freezes and
becomes unrecoverable.
I know the root cause probably comes from the NVIDIA driver (removing the
devices too early), and I've already reported it on their side. But my question
for KWin: would it be possible for the compositor to handle a GPU disappearing
during suspend more gracefully, instead of freezing the entire session? Other
environments seem to survive this.
I also noticed something that might be related: during normal usage (no suspend
at all), KWin regularly logs atomic commit failed: Device or resource busy, and
at startup it fails to open /dev/dri/card2 ("Device or resource busy"). So
there might already be some DRM device handling issue in the background.
Setup:

TongFang X6FR558Y laptop (AMD Ryzen 9 9955HX3D + NVIDIA RTX 5080 Laptop,
hybrid/Optimus config)
PikaOS 4 (Debian sid based), kernel 7.0.11-pikaos
NVIDIA driver 595.80 (open kernel module)
Plasma 6 Wayland session
s2idle suspend only (no S3 available in firmware)

The bug reproduces identically across two NVIDIA driver versions (595.71 then
595.80), and more often after a gaming session.
I can attach full logs (failed-boot journal + KWin logs) if needed.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to