On Tue, Mar 17, 2026 at 4:24 AM Cristian Cocos <[email protected]> wrote: > > ## Summary > > RADV crashes with a `[gfxhub] Page fault at address: 0x0000000000000000` when > performing Vulkan rendering on an AMD RX 9060 XT (Navi 44, GFX1200). The > crash occurs ~20-30 seconds into video playback in mpv using `vo=gpu-next` > with `gpu-api=vulkan` (libplacebo). Multiple GPU rings (sdma0, gfx_0.0.0, > comp_1.x.x) time out simultaneously. The kernel driver recovers the rings, > but the Vulkan context is lost. > > **Critically, the crash also occurs when video decode is offloaded to VA-API > on a separate Intel iGPU** — only the Vulkan rendering path (libplacebo → > RADV → `vkQueueSubmit2`) is involved. This rules out > VK_KHR_video_decode_queue as the cause. >
Please file a mesa ticket: https://gitlab.freedesktop.org/mesa/mesa/-/issues And include your full dmesg output from boot to when the issue happens. Alex > ## System Information > > | Component | Version | > |-----------|---------| > | GPU | AMD Radeon RX 9060 XT — Navi 44, RDNA 4, GFX1200 [1002:7590] (rev c0) > | > | Mesa | 26.0.2-1 (also reproduced on 26.0.1) | > | vulkan-radeon | 26.0.2-1 | > | libplacebo | v7.360.0 | > | Kernel | 6.19.8-zen1-1-zen | > | Firmware | linux-firmware-amdgpu 20260309-1 (SMC firmware 102.70.0) | > | CPU | 13th Gen Intel Core i7-1360P | > | Distro | blendOS (Arch-based, rolling) | > | mpv | v0.41.0, FFmpeg n8.0.1 | > | Connection | eGPU via Thunderbolt 4 (Razer Core X V2), PCIe 32 GT/s x16 > link | > > ### Module parameters > > ``` > options amdgpu runpm=0 rebar=0 ppfeaturemask=0xFFFF7FFF > ``` > > - `runpm=0` — runtime PM disabled (TB eGPU SMU limitation) > - `rebar=0` — BIOS assigns full 16 GB BAR, driver does not resize > - `ppfeaturemask=0xFFFF7FFF` — GFXOFF disabled (bit 15) due to SMU IF version > mismatch (driver 0x2E vs firmware 0x33) > > **Note:** The SMU interface version mismatch (`smu_v14_0: SMU driver if > version not matched`) is a separate known issue. GFXOFF is disabled to > prevent a bus-loss crash, but the rendering crash described here is unrelated > — it occurs during active rendering, not during idle. > > ## Steps to Reproduce > > 1. Install an AMD RX 9060 XT (Navi 44) > 2. Configure mpv with Vulkan rendering: > ``` > vo=gpu-next > gpu-api=vulkan > gpu-context=waylandvk > vulkan-device='AMD Radeon RX 9060 XT (RADV GFX1200)' > vulkan-async-compute=yes > vulkan-async-transfer=yes > ``` > 3. Play any video file: `mpv /path/to/video.mkv` > 4. Wait ~20-30 seconds > > ### Test 1: Vulkan decode + Vulkan rendering (`hwdec=vulkan`) > > Crashes after ~26 seconds. > > ### Test 2: VA-API decode (Intel iGPU) + Vulkan rendering (`hwdec=vaapi`) > > **Also crashes after ~26 seconds.** VA-API decode runs on the Intel iGPU > (`iHD_drv_video.so`), only Vulkan rendering runs on the AMD GPU via RADV. > This isolates the bug to the RADV rendering path. > > ## RADV Error Output > > ``` > radv/amdgpu: The CS has been cancelled because the context is lost. > This context is guilty of a hard recovery. > > [vo/gpu-next/libplacebo] vkQueueSubmit2: VK_ERROR_DEVICE_LOST > (../src/vulkan/command.c:514) > [vo/gpu-next/libplacebo] Retrieving query pool results: VK_ERROR_DEVICE_LOST > (../src/vulkan/gpu.c:105) > [vo/gpu-next/libplacebo] Failed holding swapchain image for presentation > [vo/gpu-next] Failed presenting frame! > [ffmpeg] vk: Unable to submit command buffer: VK_ERROR_DEVICE_LOST > [ffmpeg/video] h264: hardware accelerator failed to decode picture > ``` > > ## Kernel Log (Crash 1 — hwdec=vulkan, Mesa 26.0.2) > > ``` > amdgpu 0000:06:00.0: amdgpu: Dumping IP State > amdgpu 0000:06:00.0: amdgpu: [drm] AMDGPU device coredump file has been > created > amdgpu 0000:06:00.0: amdgpu: ring sdma0 timeout, signaled seq=11425, emitted > seq=11427 > amdgpu 0000:06:00.0: amdgpu: Starting sdma0 ring reset > amdgpu 0000:06:00.0: amdgpu: Ring sdma0 reset succeeded > amdgpu 0000:06:00.0: [drm] device wedged, but recovered through reset > amdgpu 0000:06:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=16289, > emitted seq=16291 > amdgpu 0000:06:00.0: amdgpu: Process mpv pid 44985 thread vo pid 45004 > amdgpu 0000:06:00.0: amdgpu: Ring gfx_0.0.0 reset succeeded > amdgpu 0000:06:00.0: [drm] device wedged, but recovered through reset > amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 timeout, signaled seq=13, > emitted seq=14 > amdgpu 0000:06:00.0: amdgpu: Process mpv pid 44985 thread vo pid 45004 > amdgpu 0000:06:00.0: amdgpu: Ring comp_1.1.0 reset succeeded > amdgpu 0000:06:00.0: [drm] device wedged, but recovered through reset > amdgpu 0000:06:00.0: amdgpu: Fence fallback timer expired on ring sdma1 > amdgpu 0000:06:00.0: [drm] *ERROR* [CRTC:416:crtc-0] flip_done timed out > ``` > > ## Kernel Log (Crash 2 — hwdec=vaapi, Mesa 26.0.2) > > ``` > amdgpu 0000:06:00.0: amdgpu: ring sdma0 timeout, signaled seq=13615, emitted > seq=13617 > amdgpu 0000:06:00.0: amdgpu: Ring sdma0 reset succeeded > amdgpu 0000:06:00.0: [drm] device wedged, but recovered through reset > amdgpu 0000:06:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=30731, > emitted seq=30733 > amdgpu 0000:06:00.0: amdgpu: Process mpv pid 66481 thread vo pid 66500 > amdgpu 0000:06:00.0: amdgpu: Ring gfx_0.0.0 reset succeeded > amdgpu 0000:06:00.0: [drm] device wedged, but recovered through reset > amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 timeout, signaled seq=312, > emitted seq=313 > ``` > > ## GPU Device Coredump (Crash 1) > > ``` > **** AMDGPU Device Coredump **** > version: 1 > kernel: 6.19.8-zen1-1-zen > module: amdgpu > time: 3054.167340782 > > SOC Device id: 30096 > SOC Family: 152 > SOC External Revision id: 65 > > HWIP: GC[1][0]: v12.0.0.0.0 > HWIP: SDMA0[3][0]: v7.0.0.0.0 > HWIP: MMHUB[12][0]: v4.1.0.0.0 > > Ring timed out details > IP Type: 2 Ring Name: sdma0 > > [gfxhub] Page fault observed > Faulty page starting at address: 0x0000000000000000 > Protection fault status register: 0x0 > ``` > > **Full coredump available on request** (543 KB). > > ## Analysis > > - The crash is a **NULL pointer dereference at GPU virtual address 0x0** — > RADV is submitting commands that reference unmapped memory. > - The `Protection fault status register: 0x0` suggests the fault info itself > is zeroed, which may indicate the fault occurred very early in command > processing or in an SDMA copy from a NULL source. > - The fault hits sdma0 first, then cascades to gfx_0.0.0 and a compute ring — > consistent with a resource upload (SDMA) referencing a NULL buffer, followed > by the GFX/compute rings trying to use the result. > - After ring resets, the GPU fully recovers (all fences drain, PCIe link > stays up at 32 GT/s x16), confirming this is a userspace (RADV) command > stream issue, not a hardware or kernel driver bug. > - The `flip_done timed out` on CRTC-0 is a secondary effect — the > compositor's page flip can't complete while rings are being reset, which > restarts the GNOME session. > > ## Additional Notes > > - The GPU is connected via Thunderbolt 4 (eGPU enclosure), but the PCIe link > stays healthy through the crash — this is not a link/BAR issue. > - This was also reproduced on Mesa 26.0.1 with kernel 6.19.6 and firmware > 20260221 (SMC 102.69.0) — same crash signature. > - Desktop compositing (GNOME Shell / Mutter on Wayland) works fine on this > GPU — only mpv's libplacebo rendering pipeline triggers the crash. > - `vulkan-async-compute=yes` was enabled. Not yet tested with async compute > disabled, though the fault is on sdma0, not a compute ring.
