https://bugs.kde.org/show_bug.cgi?id=522254
Bug ID: 522254
Summary: KWin loses DRM device access on hybrid-sleep resume
(multi-GPU system): "Failed to open drm device" →
black desktop with placeholder screen
Classification: Plasma
Product: kwin
Version First unspecified
Reported In:
Platform: EndeavourOS
OS: Linux
Status: REPORTED
Severity: major
Priority: NOR
Component: wayland-generic
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
Created attachment 193760
--> https://bugs.kde.org/attachment.cgi?id=193760&action=edit
Capture of the journal after the failure condition
Summary
On a dual-GPU system (AMD iGPU + AMD RX 9070 XT discrete), KWin fails to
reclaim the DRM
device file descriptor after hybrid-sleep resume. The result is a completely
black desktop
with no taskbar — application windows remain alive and moveable but the
compositor has no
outputs. The failure is a logind DRM seat re-grant race: KWin attempts to
reconfigure outputs
immediately upon receiving PrepareForSleep(false), before logind has finished
re-granting
the DRM device to the session via ResumeDevice.
The same failure occurs in a milder form during mid-session screen blanking (3
occurrences
this boot before the hibernate event), but KWin recovers fast enough that the
user does not
notice. On hybrid-sleep resume the failure is terminal.
System Information
FieldValueKWin version6.7.1-1Plasma version6.7.1-1OSEndeavourOS
(BUILD_ID=2025.03.19)Kernel7.0.13-arch1-1systemd261-1Session typeWayland
(Type=wayland, Seat=seat0, TTY=tty1)Display serverKWin Wayland (kwin_wayland,
DRM backend, auto-selected)Primary GPUAMD Radeon RX 9070 XT — 0000:03:00.0 →
/dev/dri/card1Secondary GPUAMD Granite Ridge iGPU — 0000:7d:00.0 →
/dev/dri/card0Monitor connectionDP-9 on card1 (discrete GPU only; iGPU has no
connected display)User groupsatlas-six sys wheel render video rfkill plugdev
...
GPU identification from lspci:
03:00.0 VGA compatible controller: AMD/ATI Navi 48 [Radeon RX 9070/9070 XT/9070
GRE] (rev c0)
7d:00.0 VGA compatible controller: AMD/ATI Granite Ridge [Radeon Graphics] (rev
c9)
DRM device permissions at time of failure:
crw-rw----+ 1 root video 226, 0 Jun 26 12:29 card0
crw-rw----+ 1 root video 226, 1 Jun 26 12:29 card1
crw-rw-rw- 1 root render 226, 128 Jun 26 08:39 renderD128
crw-rw-rw- 1 root render 226, 129 Jun 26 08:39 renderD129
(The + on card0/card1 indicates logind ACL management is active on these
nodes.)
Reproduction:
Allow the system to enter hybrid-sleep organically (via KDE power management).
Direct
systemctl hybrid-sleep does not reliably reproduce the issue — it appears to
require the
natural power management sleep path.
Steps:
- Boot into KDE Plasma Wayland session on dual-GPU system (iGPU + discrete AMD
GPU)
- Leave system idle until hybrid-sleep triggers via power management
- Wake system (keyboard/mouse input)
- Observe: application windows are present and moveable, but desktop is
entirely black,
taskbar/panel is absent, no DE interaction is possible
Frequency:
100% reproducible via organic hybrid-sleep on this system.
Failure Sequence (from journal)
Sleep initiated at 12:01:55, resume at 12:29:11 (27 minutes, within
suspend-to-RAM phase
of hybrid-sleep — this is a RAM resume, not a hibernate image restore):
Jun 26 12:01:55 powerdevil: PrepareForSleep(true=prepare)
Jun 26 12:01:55 systemd[1]: Starting System Hybrid Suspend+Hibernate...
Jun 26 12:29:11 systemd[1]: Finished System Hybrid Suspend+Hibernate.
Jun 26 12:29:11 powerdevil: PrepareForSleep(false=resume)
Jun 26 12:29:11 kwin_wayland: Applying output configuration failed!
Jun 26 12:29:11 kwin_wayland: Failed to open drm device
Jun 26 12:29:11 kwin_wayland: Failed to open drm device
Jun 26 12:29:11 ksecretd: There are no outputs - creating placeholder
screen
Jun 26 12:29:11 polkit-kde: There are no outputs - creating placeholder
screen
Jun 26 12:29:11 kactivitymanagerd: There are no outputs - creating placeholder
screen
Jun 26 12:29:11 kdeconnectd: There are no outputs - creating placeholder
screen
Jun 26 12:29:11 kwalletd6: There are no outputs - creating placeholder
screen
Jun 26 12:29:11 dolphin: There are no outputs - creating placeholder
screen
Jun 26 12:29:11 plasmashell: There are no outputs - creating placeholder
screen
Jun 26 12:29:11 kscreenlocker: There are no outputs - creating placeholder
screen
Jun 26 12:29:11 kded6: There are no outputs - creating placeholder
screen
Jun 26 12:29:12 xdg-desktop-portal-kde: There are no outputs - creating
placeholder screen
Jun 26 12:29:12 powerdevil: There are no outputs - creating placeholder
screen
KWin receives PrepareForSleep(false) and immediately attempts output
reconfiguration.
The DRM device open fails — logind has not yet completed the ResumeDevice
re-grant for
the session. KWin then reports no outputs to all connected Wayland clients
simultaneously,
producing a permanently black desktop. KWin does not retry; it remains alive
but comatose.
Mid-session occurrences (same boot, before hibernate)
The identical failure occurs 3 times during normal use (likely triggered by
screen blanking),
but KWin recovers quickly enough that the user does not notice:
Jun 26 09:47:32 kwin_wayland[1563]: Failed to open drm device (×2)
Jun 26 09:55:11 kwin_wayland[1563]: Failed to open drm device (×2)
Jun 26 10:27:10 kwin_wayland[1563]: Failed to open drm device (×2)
Total occurrences this boot: 8 (6 mid-session recoverable + 2 on hibernate
resume
non-recoverable). This pattern strongly suggests the race condition is systemic
and not
specific to hibernate; the hibernate resume simply makes it fail hard enough to
not recover.
powerdevil DRM connector churn during resume
During the ~17 seconds between PrepareForSleep(false) and the system settling,
powerdevil
logs show rapid alternating uevents for card0 and card1 as the DRM subsystem
re-enumerates
both GPUs. powerdevil also reports stale connector IDs that no longer exist:
Jun 26 12:29:22 powerdevil: Could not find DRM connector for connector id: 473
Jun 26 12:29:23 powerdevil: Could not find DRM connector for connector id: 482
This connector ID churn is consistent with the DRM subsystem reassigning IDs
during
re-enumeration on resume, which is a known behavior with AMD display hardware.
Root Cause Analysis
The failure is a logind DRM seat re-grant race condition. The sequence:
On suspend, logind revokes the session's DRM device fd via PauseDevice
On resume, logind signals PrepareForSleep(false) via D-Bus
KWin reacts to this signal and immediately calls TakeDevice (or equivalent) to
reclaim the DRM device
Race: logind has not yet processed the resume and re-granted DRM seat access
TakeDevice returns Permission Denied / device unavailable
KWin treats this as a permanent failure, marks all outputs gone, notifies all
Wayland clients
The desktop goes black; KWin does not retry
On a dual-GPU system the race window is wider because the DRM subsystem
generates a large
volume of uevents for both GPUs on resume (card0 + card1 + their connectors),
delaying
logind's re-grant processing.
The + ACL on /dev/dri/card* confirms logind ACL management is active. The user
is
a member of both video and render groups (added as part of diagnosing this
issue), but
this did not resolve the bug — group membership is a fallback that does not
apply when
logind's dynamic ACL hasn't been re-applied yet.
Expected Behavior
After hybrid-sleep resume, KWin should successfully reclaim the DRM device and
restore
all outputs. At minimum, KWin should retry TakeDevice with exponential backoff
when
it fails at resume time, rather than immediately propagating "no outputs" to
all Wayland
clients. Ideally, KWin should wait for logind's ResumeDevice signal before
attempting
output reconfiguration, rather than reacting to PrepareForSleep(false)
directly.
Suggested Fix Area
src/backends/drm/drm_gpu.cpp — DRM device open / TakeDevice call site
src/backends/drm/session_logind.cpp (if present) — PrepareForSleep /
ResumeDevice
signal handling; the fix should wait for ResumeDevice before attempting device
reconfiguration rather than acting on PrepareForSleep(false) immediately
Note: work/zamundaaa/survive-all-gpu-removal branch appears to be working in
this
problem space and may already address this.
Confirmed Workaround
Setting KWIN_DRM_DEVICES=/dev/dri/card1 in
~/.config/plasma-workspace/env/kwin-drm.sh
pins KWin to the discrete GPU only and fully resolves the issue in testing.
How to apply
bashmkdir -p ~/.config/plasma-workspace/env/
echo 'export KWIN_DRM_DEVICES=/dev/dri/card1' >
~/.config/plasma-workspace/env/kwin-drm.sh
chmod +x ~/.config/plasma-workspace/env/kwin-drm.sh
Log out and back in for the change to take effect.
Verification
With KWIN_DRM_DEVICES=card1 set, a ~2 hour organic hybrid-sleep cycle (13:53:49
→
15:50:45) produced zero Failed to open drm device occurrences and a clean
resume. KWin
was completely silent after PrepareForSleep(false) — no output configuration
failure, no
placeholder screen cascade. The only post-resume KWin log entry was a routine
window focus
event (could not find the toplevel to activate).
Jun 26 13:53:49 powerdevil: PrepareForSleep(true=prepare)
Jun 26 13:53:49 systemd[1]: Starting System Hybrid Suspend+Hibernate...
Jun 26 15:50:45 systemd[1]: Finished System Hybrid Suspend+Hibernate.
Jun 26 15:50:45 powerdevil: PrepareForSleep(false=resume)
Jun 26 15:50:46 kwin_wayland: could not find the toplevel to activate
KWin::SurfaceInterface(...)
--- (no further KWin errors; desktop resumed normally) ---
Total Failed to open drm device count this boot: 0 (vs 8 on the affected boot).
Why this works
By excluding card0 (iGPU) from KWin's device set, the uevent storm on resume is
halved —
logind only needs to process re-enumeration events for one GPU instead of two,
closing the
race window sufficiently for the ResumeDevice re-grant to complete before KWin
attempts
TakeDevice. This is a workaround, not a fix — the underlying race still exists
and will
affect single-GPU systems or systems where KWIN_DRM_DEVICES is not set.
Additional Notes:
Issue does not reproduce with systemctl hibernate (straight hibernate from
power
off state) — only with organic hybrid-sleep via KDE power management
The kscreenlocker_greet PAM conversation failure seen in earlier sessions
appears to
be a symptom/consequence of the missing outputs, not a cause
KWin process remains alive throughout — this is not a compositor crash
All application windows remain functional and moveable during the broken state
KSplash service times out approximately 60 seconds after resume, consistent
with it
waiting for a KWin "ready" signal that never arrives
Related:
KDE Bug 454433 — TakeDevice Permission Denied (related startup issue)
systemd issue #23547 — TakeDevice fails with Permission Denied on dGPU wake
work/zamundaaa/survive-all-gpu-removal — in-progress KWin branch in this space
--
You are receiving this mail because:
You are watching all bug changes.