https://bugs.kde.org/show_bug.cgi?id=512511

--- Comment #6 from Nikola Novoselec <[email protected]> ---
I spent some time digging into this and I think I've found what's going on.

TL;DR: DSC modes can't wake from DPMS because KWin's atomic path doesn't
request a full modeset, but DSC needs link retraining after the display powers
down. The driver doesn't help either - it reports everything as "Good" even
when the link is dead.

My Setup
- KWin 6.5.3, CachyOS (Arch-based), kernel 6.17.9-2-cachyos
- RTX 3090 with nvidia-open 580.105.08
- Samsung Odyssey G95NC at 7680x2160@120Hz over DisplayPort
- Wayland session, VRR off, HDR off, Night Color off
- Kernel params: nvidia_drm.modeset=1 nvidia_drm.fbdev=1

What Happens
Expected: 
- Screen wakes up when I move the mouse or press a key.

Actual:
- Lock screen, let DPMS kick in
- Try to wake up (mouse/keyboard)
- Screen stays black. Audio keeps playing, SSH works fine
- Ctrl+Alt+F2 then Ctrl+Alt+F1 brings everything back instantly
- Every. Single. Time.

What I Found
I started testing different modes to figure out what's actually broken:

Resolution                      DSC Required?   DPMS Wake
7680x2160 @ 120Hz       Yes (~48 Gbps)  Black screen
3840x2160 @ 120Hz       Borderline (~24 Gbps)   Corruption
3840x2160 @ 60Hz        No (~12 Gbps)   Works fine

So it's clearly tied to DSC. The pattern is consistent - if the mode needs DSC,
wake fails.

Looking at journalctl during wake attempts:

kwin_wayland[5884]: atomic commit failed: Permission denied
kwin_wayland[5884]: Atomic modeset test failed! Permission denied
kwin_wayland[5884]: Applying output configuration failed!
Why I Think This Happens
DSC requires full link training - DPCD negotiation, encoder/decoder sync, the
works. When the monitor goes to sleep, it tears down the DisplayPort link
completely (at least this Samsung does).

When KWin tries to wake it up, it appears to just flip CRTC.ACTIVE from 0 to 1
via DrmPipeline::commitPipelines() and submits an atomic commit without
DRM_MODE_ATOMIC_ALLOW_MODESET. The driver sees a dead link and can't satisfy
the request without retraining.

Why -EPERM specifically
The -EPERM from nvidia-drm is their way of saying "I can't do this without a
modeset". Standard DRM usually returns -EINVAL for bad configs, but NVIDIA uses
-EPERM when resource allocation fails or the hardware state requires a full
modeset to proceed.

Basically KWin is telling the driver "turn this on, but don't touch link
training or bandwidth allocation" and the driver responds "permission denied -
I literally cannot do that with a dead DSC link."

Why VT switch fixes it
VT switch forces drmDropMaster()/drmSetMaster(). KWin re-acquires master,
treats the GPU as "new", and does full initialization including link training.
This proves the hardware and driver can wake correctly - it's just the atomic
fast-path that's broken for DSC.

I also looked at MR !8282 but that's addressing hotplug state consistency
during display connect/disconnect, not this DPMS wake path.

The Kicker: Driver Doesn't Report the Failure
Here's where it gets interesting. I captured modetest via SSH while the screen
was black (before VT switch recovery):

Connector 139 (DP-3): connected 1400x400mm
  props:
    DPMS: On (value: 0)
    link-status: Good (value: 0)
    CRTC_ID: 138
Even during the failure state, the driver reports:

DPMS: On - driver thinks display is on
link-status: Good - driver reports no link problems
The nvidia-drm driver doesn't surface the DSC link failure to userspace. It
silently fails the atomic commit with -EPERM while reporting everything as
healthy. This means KWin can't use link-status: Bad as a trigger for modeset -
the driver never sets it.

This further supports unconditionally requesting modeset on DPMS wake, since
there's no reliable signal from the driver that re-training is needed.

Proposed Fix
The atomic wake path needs to request a full modeset when leaving DPMS Off.
Should be a minimal change in DrmOutput::setDpmsMode():

bool DrmOutput::setDpmsMode(DpmsMode requested)
{
    if (requested == DpmsMode::On && m_dpmsMode != requested) {
        // Link may have been torn down, allow driver to retrain
        pipeline()->setModesetRequested(true);
    }
    return commitDpms(requested);
}

This adds DRM_MODE_ATOMIC_ALLOW_MODESET to the commit, letting the kernel do
its thing with link training and DSC negotiation.
Performance impact: Adds ~20-40ms to the wake path on high-refresh panels.
Honestly that's nothing compared to a black screen.

Future optimization: Could theoretically be made conditional on:
- The upcoming kernel flag DRM_MODE_FLAG_DSC
- link-status property being Bad

However, as shown in the logs above, nvidia-drm doesn't set link-status: Bad
when DSC link training fails - it just silently rejects the atomic commit.
Until drivers reliably surface link failures, forcing modeset unconditionally
on wake is the only robust fix.

Other Compositors
I'm not the only one hitting this - wlroots, Hyprland and Sway are all adding
the same workaround:
swaywm/wlroots#2373
hyprwm/Hyprland#2696

There's also kernel-side discussion about making link-status flip to Bad
automatically when DSC is lost, so userspace can react without guessing. Until
that lands, forcing modeset on wake is the pragmatic fix.

Logs
KWin Journal (failure sequence)
Nov 29 23:24:24 kwin_wayland[6070]: Failed to delay sleep: Sender is not
authorized to send message
Nov 29 23:24:57 kwin_wayland[6070]: Atomic modeset test failed! Permission
denied
Nov 29 23:24:57 kwin_wayland[6070]: Applying output configuration failed!
Recovery via VT switch succeeded immediately after.

drm_info (working state)
Node: /dev/dri/card1
├───Driver: nvidia-drm (NVIDIA DRM driver) version 0.0.0
│   ├───DRM_CLIENT_CAP_ATOMIC supported
│   └───DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP = 1
├───Device: PCI 10de:2204 NVIDIA Corporation GA102 [GeForce RTX 3090]
└───Connector 3 (DisplayPort)
    ├───Status: connected
    ├───Physical size: 1400×400 mm
    ├───Mode: 7680×[email protected] phsync nvsync
    ├───EDID: Samsung Electric Company Odyssey G95NC
    ├───Properties
    │   ├───"DPMS": enum {On, Standby, Suspend, Off} = On
    │   ├───"link-status": enum {Good, Bad} = Good
    │   ├───"CRTC_ID" (atomic): object CRTC = 62
    │   └───"vrr_capable" (immutable): range [0, 1] = 1
    └───CRTC 0
        ├───Object ID: 62
        ├───ACTIVE: 1
        ├───MODE_ID: blob 152 (7680×[email protected])
        └───VRR_ENABLED: 0

modetest connector (before failure)
Connector 3 (139): DisplayPort
  status: connected
  modes:
    7680x2160 120.00 7680 7728 7760 7800 2160 2163 2168 2222 207640 phsync
nvsync
  props:
    DPMS: On
    link-status: Good
    CRTC_ID: 62
modetest connector (DURING failure - black screen)

Captured via SSH while screen was black, before VT switch recovery:
Connector 139 (DP-3): connected 1400x400mm
  props:
    DPMS: On (value: 0)
    link-status: Good (value: 0)
    CRTC_ID: 138
This is significant: Even during the failure state, the driver reports
everything as healthy. It silently fails the atomic commit with -EPERM while
claiming the link is fine.

Also tested with the proprietary NVIDIA driver (565.77) - same behavior.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to