https://bugs.kde.org/show_bug.cgi?id=520562
Matei Marcu <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #193126|0 |1 is obsolete| | --- Comment #5 from Matei Marcu <[email protected]> --- Created attachment 193127 --> https://bugs.kde.org/attachment.cgi?id=193127&action=edit Corrected log + comment bundle. Supersedes previous attachment. Thanks for the review and the pointer to the crash-report guide. **Errata to the original report:** DP-1 (Alienware AW2725DF) and DP-3 (ASUS PG27AQWP-W) currently run in landscape orientation. The original report's "portrait" framing reflected a temporary rotation experiment carried out during local diagnostics; physical rotation turned out not to be load-bearing for this failure (the EINVAL cascade fires identically with the outputs in either orientation, and DRM mode strings are always reported in native-landscape resolution regardless of plane rotation). A note on the artifact shape: this is a configuration-loop bug rather than a SIGSEGV, so `DrKonqi` does not fire, `coredumpctl` has no entry, and a postmortem backtrace from a coredump is not available. In place of that, this capture pairs `strace -e ioctl` on `kwin_wayland` (which records every `DRM_IOCTL_MODE_ATOMIC` call during the failure cascade) with the per-pipeline diagnostic output from a locally patched KWin build, plus a symbolic `gdb thread apply all bt full` captured shortly after KWin gives up retrying (shows the post-failure idle state of all seven KWin threads with full source line resolution via debuginfod). The narrative is in the inline summary below; the bulk artifacts are attached. ## Updated failure description (sharper than the original report) The original report described the symptom as a "wedge". With the new strace capture the failure model is more precise: KWin's `KWin::DrmGpu::testPipelines` (`src/backends/drm/drm_gpu.cpp:484`) calls `commitPipelines(CommitMode::TestAllowModeset)` with all three outputs bundled into a single atomic test commit. The kernel returns `EINVAL`. `KWin::DrmGpu::checkCrtcAssignment` (`drm_gpu.cpp:355`) then retries with a different CRTC-to-connector permutation. Every permutation returns `EINVAL`. KWin exhausts the search after 168 atomic test commits in ~4.2 seconds and disables DP-2 (and on some runs cascades to disabling all outputs). The kernel is not deadlocked. The `nvidia-modeset` driver returns `EINVAL` synchronously from each `DRM_IOCTL_MODE_ATOMIC` call; no thread is stuck in an ioctl. The earlier report's "hung-task watchdog" framing was likely incorrect — KWin's threads in the present capture are not stuck in any ioctl; they sleep normally in `pthread_cond_wait` after the recursive search gives up. A separate `v4l2_open` BUG_ON in the out-of-tree `v4l2loopback` module was observed in `dmesg` during this investigation and may have been the actual source of the watchdog dumps in the original report; this is not asserted with full confidence. ## Inline summary of the strace capture (the actual gold) >From `strace -f -tt -v -e trace=ioctl` on `kwin_wayland` during a single trigger event: ``` total DRM_IOCTL_MODE_ATOMIC calls : 475 returns: 0 307 (succeeded) -1 168 (EINVAL) flags distribution: ATOMIC_TEST_ONLY|ATOMIC_ALLOW_MODESET 168 <-- ALL FAIL with EINVAL ATOMIC_TEST_ONLY|ATOMIC_NONBLOCK 156 <-- all succeed PAGE_FLIP_EVENT|ATOMIC_NONBLOCK 151 <-- all succeed count_objs distribution: 2 objects 145 calls (page flips, succeed) 3 objects 162 calls (single-output tests, succeed) 21 objects 168 calls (three-output bundled tests, all EINVAL) ``` Interpretation: - Single-output atomic tests (`count_objs=3`, one CRTC + one connector + one plane) pass throughout. - Page-flip submissions (`count_objs=2`) on individual outputs pass throughout. - Only the combined three-output test commit (`count_objs=21`) under `ATOMIC_ALLOW_MODESET` fails, and it fails every time, 168 in a row. - The 168 figure is the call count of `commit->testAllowModeset()` (which performs `DRM_IOCTL_MODE_ATOMIC` with `TEST_ONLY | ALLOW_MODESET`) inside `checkCrtcAssignment`'s recursive CRTC-to-connector search. The four distinct CRTC IDs cycled through are 62, 81, 100, 119. The exact derivation of the 168 number from the recursion depth is not asserted here. The patched diagnostic build prints the connector/CRTC/mode triple for every pipeline in every failed attempt; that output is 504 lines (= 168 × 3) and is in `journal-user.txt` in the attachment bundle. A representative window: ``` pipeline connector="PG27AQWP-W" crtc=62 mode=2560x1440@540Hz needsModeset=0 pipeline connector="AW2725DF" crtc=81 mode=2560x1440@360Hz needsModeset=0 pipeline connector="BenQ PD3200U" crtc=100 mode=3840x2160@60Hz needsModeset=0 Atomic modeset test failed! Invalid argument pipeline connector="PG27AQWP-W" crtc=62 mode=2560x1440@540Hz needsModeset=0 pipeline connector="AW2725DF" crtc=81 mode=2560x1440@360Hz needsModeset=0 pipeline connector="BenQ PD3200U" crtc=119 mode=3840x2160@60Hz needsModeset=0 Atomic modeset test failed! Invalid argument ... 166 more attempts permuting BenQ across CRTCs 62, 81, 100, 119 ... ``` The BenQ PD3200U is requested at 3840x2160@60 Hz on every attempt. The other two outputs (ASUS PG27AQWP-W at 2560x1440@540 Hz and Alienware AW2725DF at 2560x1440@360 Hz) keep their modes. KWin reshuffles CRTC assignments across 168 permutations; the kernel rejects each one. This narrows the question to nvidia-modeset's atomic-commit validation path for `DRM_MODE_ATOMIC_ALLOW_MODESET` with three concurrent outputs when the BenQ's 4K60 timing is in the set. Single-output and page-flip commits validate fine; the combined three-output modeset test is the only commit shape that rejects. ## Userspace context (per the KDE crash-report guide's KWin-specific section) - Compositing: enabled. No `Backend=` override in `~/.config/kwinrc`; KWin 6.6.4 default applies (OpenGL). - Effects: no `[Plugins]` section in `~/.config/kwinrc`; defaults apply for Plasma 6.6.5. - Decorations: Aurorae library (`org.kde.kwin.aurorae.v2`), theme `__aurorae__svg__Fluent-round-dark`. Not Breeze; noted because window decoration plumbing is not in the failing atomic property set, but stating accurately for the record. - Drivers: nvidia-beta-dkms 595.71.05 (closed). The bug also reproduces with nvidia-open-dkms 595.71.05 with both an EINVAL atomic-test failure and an additional EGLImage `GL_INVALID_OPERATION` cascade. - Kernel: 7.0.9-zen2-1-zen - KWin: 6.6.4 with a single local diagnostic patch at `src/backends/drm/drm_pipeline.cpp:138-149` that emits `qCWarning(KWIN_DRM)` per-pipeline state on `testAllowModeset()` failure. The patched binary is at `~/kwin-src/install/bin/kwin_wayland`; `kwin-id.txt` records the path and `kwin --version` output (`kwin 6.6.4`). ## Attachments - `strace-ioctl.txt` — strace with full ioctl decode, captures the 168 EINVAL events - `atomic-decoded.txt` — Python-decoded human-readable summary of every DRM atomic ioctl - `journal-user.txt` — patched-build per-pipeline diagnostic output (504 lines) - `journal-kernel.txt` — kernel side, includes SysRq +w / +t task dumps - `dmesg.txt` — kernel ring buffer through the burst - `gdb-bt-postwedge.txt` — symbolic `thread apply all bt full` captured shortly after the failure cascade ended; seven KWin threads, post-failure idle state, full source line resolution via debuginfod - `wchan.txt` — per-thread kernel wait-channel - `kwin-id.txt` — binary identity, version, command-line - `modules-env.txt` — kernel cmdline, nvidia version, relevant lsmod entries - `kmsg-burst.txt` — /dev/kmsg readout bracketing the failure window Happy to capture anything else that would help — including `gdb` set up to break on `drmModeAtomicCommit` if the property/value array of a failing call would be useful, or an `ftrace` of the `drm:` event family during the burst. -- You are receiving this mail because: You are watching all bug changes.
