On 2025. december 22., hétfő 0:18:39 Thomas Glanzmann wrote: > Hello, > I have a PC which worked fine for many years that I did not use for half > a year. Yesterday I want to use it, but sway appears to be crashing > amdgpu in DRM. The components are: > > - ASUS System Product Name/TUF GAMING B650M-PLUS > - AMD Ryzen 9 7950X 16-Core Processor > - Debian trixie
Hi Thomas, Which GPU do you have? Are you using the iGPU from the 7950X? Also for these kind of issues it would be nice to mention details such as: - Kernel version, Mesa version, amdgpu firmware version - Does the crash still happen on a recent kernel? I suggest to open an issue in the drm/amd bug tracker with all those details: https://gitlab.freedesktop.org/drm/amd/-/issues > > I already tried the following: > > - Upgrading to Debian forky > - Debian trixie live cd > - Installing the latested amd gpu firmware > - Updating the Bios to the latest. > > In order to reproduce the issue, I boot linux, start sway and open an > alacritty terminal with a tmux inside. amdgpu crashes immediatly. Find > here a video and the full dmesg. Unfortunately, the dmesg log is not actionable. It shows that there was a GPU hang, but there is no indication of what was happening. It does show that there was a successful GPU recovery though. Does your system stay usable afterwards? The devcoredump is also not actionable because it has no details about what was happening on the GPU as it was crashing. If you know that the system used to work well beforehand, the best would be to tell us which kernel version used to work, which is the first version that broke and bisect from there. > > https://tg.st/u/dmesg_9f62587406fb808dc4d91d41029ccf88ceeadf13e1f91d65c27b57 > 536f375550.txt > https://tg.st/u/amdgpu_device_coredump_data_a25f2060c56260bb46ac95ee3123969 > d5127bf31b29ea3adfe3feeac67bf4edc.zst > https://tg.st/u/VID_20251222_071051104.mp4 > > [ 57.342777] amdgpu 0000:0b:00.0: amdgpu: Dumping IP State > [ 57.343822] amdgpu 0000:0b:00.0: amdgpu: Dumping IP State Completed > [ 57.343869] amdgpu 0000:0b:00.0: amdgpu: [drm] AMDGPU device coredump > file has been created [ 57.343871] amdgpu 0000:0b:00.0: amdgpu: [drm] > Check your /sys/class/drm/card0/device/devcoredump/data [ 57.343872] > amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=106, > emitted seq=108 [ 57.343873] amdgpu 0000:0b:00.0: amdgpu: Process sway > pid 2021 thread sway:cs0 pid 2317 [ 57.343875] amdgpu 0000:0b:00.0: > amdgpu: Starting gfx_0.0.0 ring reset [ 57.485168] amdgpu 0000:0b:00.0: > amdgpu: Ring gfx_0.0.0 reset failed [ 57.485170] amdgpu 0000:0b:00.0: > amdgpu: GPU reset begin! > [ 57.609921] amdgpu 0000:0b:00.0: amdgpu: MODE2 reset > [ 57.616920] amdgpu 0000:0b:00.0: amdgpu: GPU reset succeeded, trying to > resume [ 57.617008] [drm] PCIE GART of 1024M enabled (table at > 0x000000F41FC00000). [ 57.617024] amdgpu 0000:0b:00.0: amdgpu: PSP is > resuming... > [ 57.638326] amdgpu 0000:0b:00.0: amdgpu: reserve 0xa00000 from > 0xf41e000000 for PSP TMR [ 57.832236] amdgpu 0000:0b:00.0: amdgpu: RAS: > optional ras ta ucode is not available [ 57.837959] amdgpu 0000:0b:00.0: > amdgpu: RAP: optional rap ta ucode is not available [ 57.837961] amdgpu > 0000:0b:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not > available [ 57.837963] amdgpu 0000:0b:00.0: amdgpu: SMU is resuming... > [ 57.838869] amdgpu 0000:0b:00.0: amdgpu: SMU is resumed successfully! > [ 57.839132] amdgpu 0000:0b:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0 > [ 57.842333] amdgpu 0000:0b:00.0: amdgpu: [drm] DMUB hardware initialized: > version=0x05002C00 [ 57.944932] amdgpu 0000:0b:00.0: amdgpu: ring > gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 57.944935] amdgpu 0000:0b:00.0: > amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0 [ 57.944936] amdgpu > 0000:0b:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0 [ > 57.944937] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 > on hub 0 [ 57.944938] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.0 uses > VM inv eng 6 on hub 0 [ 57.944938] amdgpu 0000:0b:00.0: amdgpu: ring > comp_1.3.0 uses VM inv eng 7 on hub 0 [ 57.944939] amdgpu 0000:0b:00.0: > amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 [ 57.944939] amdgpu > 0000:0b:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 [ > 57.944940] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 > on hub 0 [ 57.944940] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.1 uses > VM inv eng 11 on hub 0 [ 57.944941] amdgpu 0000:0b:00.0: amdgpu: ring > kiq_0.2.1.0 uses VM inv eng 12 on hub 0 [ 57.944941] amdgpu 0000:0b:00.0: > amdgpu: ring sdma0 uses VM inv eng 13 on hub 0 [ 57.944942] amdgpu > 0000:0b:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 [ > 57.944943] amdgpu 0000:0b:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 > on hub 8 [ 57.944943] amdgpu 0000:0b:00.0: amdgpu: ring vcn_enc_0.1 uses > VM inv eng 4 on hub 8 [ 57.944944] amdgpu 0000:0b:00.0: amdgpu: ring > jpeg_dec uses VM inv eng 5 on hub 8 [ 57.948092] amdgpu 0000:0b:00.0: > amdgpu: GPU reset(1) succeeded! [ 57.948107] amdgpu 0000:0b:00.0: [drm] > device wedged, but recovered through reset [ 57.961832] > [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! > > I'm grateful for any pointers that resolve the issue and available for > debugging. > > Cheers, > Thomas
