https://bugzilla.kernel.org/show_bug.cgi?id=221543

            Bug ID: 221543
           Summary: [nouveau] RTX 5080 (GB203): Xid:13 WIDTH CT Violation
                    on kernels 6.16.7+ causing system hang
           Product: Drivers
           Version: 2.5
          Hardware: Other
                OS: Linux
            Status: NEW
          Severity: high
          Priority: P3
         Component: Video(DRI - non Intel)
          Assignee: [email protected]
          Reporter: [email protected]
        Regression: No

Hardware: Lenovo ThinkStation P8 (30HJS53F00)
  CPU: AMD Threadripper PRO 7975WX (32C/64T)
  GPU: NVIDIA GeForce RTX 5080 (GB203, rev a1) [PCI 81:00.0]
  RAM: 128GB
  OS: Fedora 43 (Workstation)
  Affected kernels: 6.16.7 through 7.0.8 (latest tested)
  Working kernel: 6.13.6-200.fc41

  Problem:
  On kernels 6.16.7 and later, the nouveau driver triggers repeated Xid:13
  (WIDTH CT Violation) errors on all 7 GPCs of the RTX 5080. The errors are
  triggered by Chrome GPU acceleration but occur even with hardware
  acceleration disabled in Chrome.

  The error cascade is:
  1. Chrome triggers Xid:13 on nouveau
  2. Chrome GPU channel is killed
  3. Xid:13 repeats every ~20 minutes
  4. Xwayland encounters an MMU fault
  5. System hangs completely (B0E1/S202 diagnostic codes on chassis)

  Reproducer:
  - Boot any kernel >= 6.16.7 with nouveau on RTX 5080
  - Open Chrome (or Chromium)
  - Xid:13 errors appear within seconds in dmesg
  - System hangs within hours (sometimes minutes)

  What does NOT help:
  - Switching to X11 (hangs before reaching graphical mode)
  - Disabling Chrome hardware acceleration
  - Disabling C-States in BIOS
  - glxgears/GPU stress tests do NOT reproduce (only Chrome triggers it)

  Workaround:
  Downgrade to kernel 6.13.6-200.fc41. On that kernel, nouveau reports
  "unknown chipset" for RTX 5080 (expected, as GB203 support was added
  later), so the proprietary NVIDIA driver is required.

  This suggests a regression was introduced between 6.13.6 and 6.16.7 in
  nouveau's GB203/RTX 5080 support path.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Reply via email to