https://bugs.kde.org/show_bug.cgi?id=520562

            Bug ID: 520562
           Summary: Compositor wedges enabling 3rd output on NVIDIA RTX
                    4090: atomic modeset test rejected with EINVAL, GL
                    framebuffer-incomplete storm
    Classification: Plasma
           Product: kwin
      Version First 6.6.5
       Reported In:
          Platform: EndeavourOS
                OS: Linux
            Status: REPORTED
          Severity: crash
          Priority: NOR
         Component: wayland-generic
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

# Bug report — ready to paste into bugs.kde.org

## Top-of-form fields

- **Classification:** Plasma
- **Product:** kwin
- **Component:** wayland-generic
- **Version:** 6.6.5
- **Severity:** crash
- **Hardware:** Other (or PC / x86_64 if available)
- **OS:** Linux

## Summary field (one line)

```
DRM atomic test rejected with EINVAL on 3-pipeline (2× 1440p portrait + 1×
4K60) NVIDIA RTX 4090 configuration; compositor proceeds into GL path,
producing 1000+ GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT errors and wedges
```

## Description textarea

Paste everything below (after the `---` line) into the Description box,
**after** deleting the placeholder comment block (the lines starting with `***`
and ending at the second `***`).

---

DESCRIPTION

When a third connector is enabled at runtime on an NVIDIA RTX 4090 (Ada
Lovelace) running the proprietary `nvidia` driver,
`DrmGpu::testPendingConfiguration()` cannot find any CRTC assignment that the
kernel accepts. Every `drmModeAtomicCommit(..., DRM_MODE_ATOMIC_TEST_ONLY |
DRM_MODE_ATOMIC_ALLOW_MODESET, ...)` call returns `EINVAL`. KWin exhausts its
3-second combinatorial CRTC search and the `m_forceLowBandwidthMode = true`
retry, then propagates the error upward. The GL compositor backend then emits
~1,000 `GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT` errors and the session becomes
unresponsive. Recovery requires a hard reboot.

This is a hot-plug failure, not a startup failure. With the third monitor's
cable connected but its KWin output disabled, the system is fully stable across
reboots. Enabling the output via Display Settings or `kscreen-doctor
output.DP-2.enable` triggers the failure deterministically. The same failure
reproduces on `nvidia-beta-dkms 595.71.05-1` and `nvidia-open-dkms
595.71.05-2`.


STEPS TO REPRODUCE

1. Boot with only DP-1 (Dell AW2725DF, 2560×1440 portrait @ 360 Hz, 10 bpc) and
DP-3 (ASUS PG27AQWP-W, 2560×1440 portrait @ 540 Hz, 10 bpc) connected. Plasma
comes up healthy and remains stable.
2. Plug a BenQ PD3200U (3840×2160 @ 60 Hz, 8 bpc) into DP-2 with the output
disabled in KWin: Display Settings → DP-2 → Disabled → Apply. The system
remains stable in this state across reboots, indicating the failure is not
caused by the cable, EDID, or initial probe.
3. Enable the DP-2 output via Display Settings → Enabled → Apply, or run
`kscreen-doctor output.DP-2.enable`.


OBSERVED RESULT

Within ~1 second, KWin emits ~37 `Atomic modeset test failed! Invalid argument`
warnings. With a small local diagnostic patch (in ADDITIONAL INFORMATION
below), each warning is followed by per-pipeline
connector/CRTC/mode/needsModeset lines, showing the recursion across CRTC
permutations.

Within the next ~4 seconds, the GL backend emits ~1,032 paired
`GL_INVALID_OPERATION error generated. <image> and <target> are incompatible.`
and `Invalid framebuffer status: GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT` errors.

The compositor then stops emitting log output. The display freezes on the last
rendered frame. The kernel hung-task watchdog begins dumping the loaded-modules
list every ~3 seconds. Recovery requires Alt+SysRq+REISUB or a hard reboot.


EXPECTED RESULT

Either:
(a) the kernel-side atomic test accepts the 3-pipeline configuration on this
hardware, and DP-2 enables successfully, or
(b) `testPendingConfiguration()` fails cleanly, KWin reverts to the prior
2-monitor configuration, and the user sees a clear error indicating that DP-2
could not be enabled — without entering the GL render-setup retry loop that
produces 1000+ framebuffer-incomplete errors and wedges the session.


SOFTWARE/OS VERSIONS

Operating System: EndeavourOS (Arch-based rolling release)
KDE Plasma Version: 6.6.5
KDE Frameworks Version: 6.26.0
Qt Version: 6.11.1
Kernel Version: 7.0.9-zen2-1-zen (also reproduces on 7.0.9-arch2-1)
Graphics Platform: Wayland
Graphics Processor: NVIDIA GeForce RTX 4090 (Ada Lovelace, AD102), 24 GiB VRAM,
PCIe 4.0 x16
NVIDIA driver: nvidia-beta-dkms 595.71.05-1 + nvidia-utils-beta 595.71.05-1
               (also reproduces with nvidia-open-dkms 595.71.05-2 +
nvidia-utils-beta 595.71.05-1)
KWin source under analysis: tag v6.6.4, commit
203f8d127b93b7ced97345a3b640ae22bb5d5919


ADDITIONAL INFORMATION

== Diagnostic patch ==

A small local patch to `src/backends/drm/drm_pipeline.cpp` extends the existing
`TestAllowModeset` failure warning with per-pipeline state. The patch is fully
contained inside the existing failure branch, has no runtime cost on the
success path, and exposes the data needed to diagnose this and similar
failures.

--- a/src/backends/drm/drm_pipeline.cpp
+++ b/src/backends/drm/drm_pipeline.cpp
@@ -135,6 +135,18 @@ DrmPipeline::Error DrmPipeline::commitPipelinesAtomic(...)
     case CommitMode::TestAllowModeset: {
         if (!commit->testAllowModeset()) {
             qCWarning(KWIN_DRM) << "Atomic modeset test failed!" <<
strerror(errno);
+            qCWarning(KWIN_DRM, "  pipelines in commit: %lld",
static_cast<long long>(pipelines.size()));
+            for (DrmPipeline *p : pipelines) {
+                const auto m = p->mode();
+                const QString conn = p->connector() ?
p->connector()->modelName() : QStringLiteral("?");
+                const uint32_t crtcId = p->crtc() ? p->crtc()->id() : 0;
+                qCWarning(KWIN_DRM, "    pipeline connector=\"%s\" crtc=%u
mode=%dx%d@%uHz needsModeset=%d",
+                          qPrintable(conn), crtcId,
+                          m ? m->size().width() : 0,
+                          m ? m->size().height() : 0,
+                          m ? m->refreshRate() : 0,
+                          p->needsModeset());
+            }
             return errnoToError();
         }


== Trace excerpt ==

Captured via `journalctl --user -b --identifier=kwin_wayland`. The excerpt
below shows the first ~5 of 37 failures; the full ~24 MB systemd-journal export
is available on request.

May 23 14:45:53 host kwin_wayland: Atomic modeset test failed! Invalid argument
May 23 14:45:53 host kwin_wayland:   pipelines in commit: 3
May 23 14:45:53 host kwin_wayland:     pipeline connector="AW2725DF"    
crtc=62  mode=2560x1440@359979Hz needsModeset=0
May 23 14:45:53 host kwin_wayland:     pipeline connector="PG27AQWP-W"  
crtc=81  mode=2560x1440@540000Hz needsModeset=0
May 23 14:45:53 host kwin_wayland:     pipeline connector="BenQ PD3200U"
crtc=100 mode=3840x2160@59997Hz  needsModeset=0
May 23 14:45:54 host kwin_wayland: Atomic modeset test failed! Invalid argument
May 23 14:45:54 host kwin_wayland:   pipelines in commit: 3
May 23 14:45:54 host kwin_wayland:     pipeline connector="AW2725DF"    
crtc=62  mode=2560x1440@359979Hz needsModeset=0
May 23 14:45:54 host kwin_wayland:     pipeline connector="PG27AQWP-W"  
crtc=81  mode=2560x1440@540000Hz needsModeset=0
May 23 14:45:54 host kwin_wayland:     pipeline connector="BenQ PD3200U"
crtc=119 mode=3840x2160@59997Hz  needsModeset=0
May 23 14:45:54 host kwin_wayland: Atomic modeset test failed! Invalid argument
May 23 14:45:54 host kwin_wayland:   pipelines in commit: 3
May 23 14:45:54 host kwin_wayland:     pipeline connector="AW2725DF"    
crtc=100 mode=2560x1440@359979Hz needsModeset=0
May 23 14:45:54 host kwin_wayland:     pipeline connector="PG27AQWP-W"  
crtc=81  mode=2560x1440@540000Hz needsModeset=0
May 23 14:45:54 host kwin_wayland:     pipeline connector="BenQ PD3200U"
crtc=62  mode=3840x2160@59997Hz  needsModeset=0

[... 34 further failures, cycling through every permutation
     of CRTCs {62, 81, 100, 119} across the three connectors ...]

May 23 14:45:55 host kwin_wayland: GL_INVALID_OPERATION error generated.
<image> and <target> are incompatible.
[... ×1032 paired with GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT, over ~4 seconds
...]
May 23 14:45:58 host kwin_wayland: Invalid framebuffer status:
GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT


== Analysis: KWin side ==

The relevant code path is `src/backends/drm/drm_gpu.cpp:412-457`:

- `drm_gpu.cpp:412-414`: `s_checkCrtcTimeout = 3s`, overridable via the
`KWIN_DRM_PENDING_CONFIG_TIMEOUT` environment variable.
- `drm_gpu.cpp:416`: `testPendingConfiguration()` is invoked whenever the
connector set changes (DP-2 hot-enable).
- `drm_gpu.cpp:443-446`: first `checkCrtcAssignment()` pass with
`m_forceLowBandwidthMode = false`. Returns non-None.
- `drm_gpu.cpp:448-456`: if FB2-modifiers are supported or any output requests
`PreferAccuracy`, a second pass runs with `m_forceLowBandwidthMode = true`.
Also returns non-None.
- `drm_pipeline.cpp:137`: `commit->testAllowModeset()` issues the actual
`DRM_IOCTL_MODE_ATOMIC` with `DRM_MODE_ATOMIC_TEST_ONLY |
DRM_MODE_ATOMIC_ALLOW_MODESET`. Returns false; `errno = EINVAL`.

The 37 trace entries correspond to KWin's recursion across CRTC permutations
within the 3-second budget. Both the initial pass and the low-bandwidth retry
exhaust their search without finding a kernel-acceptable assignment.

KWin's existing source acknowledges the absence of a diagnostic channel for
this code path at `drm_gpu.cpp:452-454`:

    if (m_addFB2ModifiersSupported || hasPreferAccuracy) {
        // We currently don't have any information about why the output config
        // got rejected; one possibility is missing memory bandwidth.
        m_forceLowBandwidthMode = true;
        err = checkCrtcAssignment(connectors, crtcs,
                                  std::chrono::steady_clock::now() +
s_checkCrtcTimeout);
    }

The `m_forceLowBandwidthMode = true` retry path does not rescue the
configuration on this hardware. Even with primary planes forced to
`lowBandwidthFormats` (per `drm_layer.cpp:128-129`), the kernel continues to
reject the commit.


== Analysis: kernel / driver side ==

`nvidia-drm` rejects every test commit with `EINVAL` and emits nothing to
dmesg:
- No `NVRM:` warnings
- No atomic-test rejection log
- No `nv_drm_atomic_check` failure trace

Behaviour is identical between `nvidia-beta-dkms` and `nvidia-open-dkms` at
version 595.71.05. The open driver was tested specifically to rule out
closed-source EDID handling; the rejection persists.

The `drm.edid_firmware=DP-2:...` kernel parameter is silently ignored by both
driver variants — `nvidia-modeset` does not exercise the upstream
`drm_edid_load` helper. This is not part of this bug, but eliminates a commonly
suggested workaround.


== Analysis: secondary failure (GL retry storm) ==

After `testPendingConfiguration()` returns `Error::InvalidArguments`, the
compositor still enters a GL render setup path that depends on a successful
atomic state. The result is ~1,032 `GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT`
errors in 4 seconds — far more than the number of frames or outputs involved —
followed by no further `kwin_wayland` log output and a kernel hung-task
watchdog. This is consistent with a retry loop that lacks a breaker on prior
DRM-state failure.

This second behaviour is separable from the primary kernel-rejection problem:
when `testPendingConfiguration()` fails, the compositor should not enter the
EGLImage attachment retry loop, since the underlying DRM state is known-bad.


== Likely root cause ==

The primary failure originates in the NVIDIA driver's atomic-check path for the
specific combination of:
- One 4K @ 60 Hz, 8 bpc connector, plus
- Two 1440p @ high-refresh (360 Hz, 540 Hz) 10 bpc connectors

The 540 Hz panel alone requires a per-pipeline bandwidth on the order of ~26
Gbps (10 bpc, 4:4:4, with DSC); adding the 4K @ 60 BenQ on top of two such
pipelines may exceed the GPU's display-engine bandwidth budget. The driver
returns a generic `EINVAL` without indicating bandwidth as the cause.

This report targets KWin rather than NVIDIA because:
1. KWin's diagnostic infrastructure is currently the only window into the
failure.
2. The secondary GL retry storm is squarely a KWin concern.
3. The diagnostic patch above (or an equivalent) would meaningfully reduce
time-to-root-cause for similar future reports.

A mirror report against `NVIDIA/open-gpu-kernel-modules` can be filed on
request.


== Requests ==

In rough priority order:

1. Merge per-pipeline diagnostic dump into mainline KWin. The patched lines at
`drm_pipeline.cpp:138-149` are local to one switch arm, only fire on failure,
and provide the data needed to diagnose this and similar reports. Can be gated
behind `KWIN_DRM.warning=true` or a new `KWIN_DRM_DEBUG` channel.

2. Audit the GL retry path that produces 1,032
`GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT` errors in 4 seconds following a failed
`testPendingConfiguration()`. The compositor should refuse to compose against
an output set whose atomic test was rejected, surface a user-visible error, and
revert to the prior working configuration.

3. Extend the low-bandwidth retry diagnostic. When the `m_forceLowBandwidthMode
= true` retry also fails (`drm_gpu.cpp:455`), log the per-pipeline
modifier/format set on each plane in addition to connector/CRTC. This
distinguishes bandwidth from modifier-mismatch root causes.

4. Coordination. If any KWin maintainer has direct contact with NVIDIA's Linux
DRM team, cross-referencing this report (with the trace) into their tracker
would be valuable.


== Workarounds attempted and confirmed insufficient ==

- `colorPowerTradeoff: PreferEfficiency` + `maxBitsPerColor: 8` in
`~/.config/kwinoutputconfig.json` for all three outputs — does not prevent the
failure. The crash happens during the output-add path, before per-output color
settings take effect.
- Driver swap between `nvidia-beta-dkms` and `nvidia-open-dkms` at the same
upstream version — does not help. On `nvidia-open-dkms`, the error surface
widens: both the atomic and GL classes fire (vs. only one class at a time on
the proprietary driver).
- `drm.edid_firmware=DP-2:...` kernel parameter to force a specific BenQ EDID —
silently ignored by both NVIDIA driver variants.
- `KWIN_DRM_USE_MODIFIERS=0` is contraindicated on NVIDIA hardware (causes a
black-screen via broken EGLImage import) and is explicitly not a workaround for
this bug.


== Related bugs ==

Bug 462214 (RESOLVED FIXED in Plasma 5.26.5, commit
`54a1858316b350b8ee3767d756f516f30b4a5b04`) — Same two KWin error strings
("Failed to find a working setup for new outputs!" + "Atomic modeset test
failed! Invalid argument"), but a different root cause: plane reassignment
commits issued without first disabling the CRTC. That fix is vendor-generic and
merged ~3.5 years before KWin 6.6.4/6.6.5, so it is already present in this
version and is not the EINVAL source here. Original report was on
Qualcomm/Freedreno; the fix applies to NVIDIA equally.

Bug 515835 (ASSIGNED) — Same surface symptom ("Atomic modeset test failed!
Invalid argument" on Wayland) but distinct root cause: i915 DP-MST topology
corruption on KVM-switch hotplug, with Wayland clients receiving an `invalid
global wl_output` protocol error while KWin itself survives. This report
differs: NVIDIA RTX 4090, software-enable of an already-connected SST monitor
(no MST, no hotplug), both `checkCrtcAssignment` passes (including the
`m_forceLowBandwidthMode=true` retry) exhaust, and the compositor itself wedges
in a `GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT` retry storm requiring a hard
reboot. Linked merge requests 8809 / 8810 / 8812 touch atomic-test handling in
adjacent code and may share affected code paths.

The following resolved bugs share the `"Atomic modeset test failed!"` warning
string but were confirmed unrelated by inspection of their fixing commits,
errno class, and trigger conditions:
- Bug 512146 (RESOLVED UPSTREAM, fixed in KWin 6.5.4) —
`outputconfigurationstore` mode-selection policy on VGA + Intel iGPU. Different
errno (EACCES), different trigger (first-boot default mode), unrelated
subsystem.
- Bug 512511 (RESOLVED FIXED, commit
`f0f00551a40ec071095c7f3076221d898ffb95af`) — DPMS coordination through
`Workspace`/`OutputConfiguration`. Different errno (EPERM/"Permission denied"),
different trigger (dim/full-screen DPMS transition), fix scoped to
`drm_backend.cpp` / `drm_output.cpp` and does not touch the general atomic-test
path.
- Bug 512968 (RESOLVED FIXED, commit
`a29fb8afaa7a41329df2575e328dad2cfb51ff9f`) —
`RenderLoopDrivenQAnimationDriver::advanceToNextFrame()` in `compositor.cpp`.
QtQuick animation-driver timing regression, crashes only on the legacy
`KWIN_DRM_NO_AMS=1` path. Different subsystem from atomic-modeset on enabled
path.

The recurrence of unrelated reports clustering under the same warning string is
the practical motivation for ask #1 above (promote per-pipeline diagnostic dump
to mainline).


== Available diagnostic artefacts ==

The following are available on request and can be attached to the ticket:
- Full systemd-journal export (~24 MB) covering the crash window
- Patched KWin build: prefix `~/kwin-src/install`, source at commit 203f8d1 +
the single diagnostic commit above
- `kscreen-doctor -o` output in both the 2-monitor (stable) and 3-monitor (just
before crash) states
- `lspci -vvv` for the RTX 4090
- `cat /sys/class/drm/card*/device/uevent`
- `cat /proc/cmdline`
- `glxinfo -B` and `eglinfo` output

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to