Hi Alex,

Thank you for the quick response and for the information about hibernation 
support.

Here's the stack trace showing the call chain when the TLB flush failures 
occur. The issue happens in two places:

1. During resume (hibernation restore):

Call Trace:
 dump_stack_lvl+0x5b/0x80
 amdgpu_gmc_fw_reg_write_reg_wait+0x1c7/0x1d0 [amdgpu]
 gmc_v9_0_hw_init+0x2e2/0x390 [amdgpu]
 gmc_v9_0_resume+0x26/0x70 [amdgpu]
 amdgpu_ip_block_resume+0x27/0x50 [amdgpu]
 amdgpu_device_ip_resume_phase1+0x55/0x90 [amdgpu]
 amdgpu_device_resume+0x69/0x380 [amdgpu]
 amdgpu_pmops_resume+0x46/0x80 [amdgpu]
 dpm_run_callback+0x4a/0x150
 device_resume+0x1df/0x2f0
 async_resume+0x21/0x30
 async_run_entry_fn+0x36/0x160
 process_one_work+0x193/0x350
 worker_thread+0x2d7/0x410

2. Subsequent failures during command submission:

Call Trace:
 dump_stack_lvl+0x5b/0x80
 amdgpu_gmc_fw_reg_write_reg_wait+0x1c7/0x1d0 [amdgpu]
 amdgpu_gmc_flush_gpu_tlb+0xd0/0x280 [amdgpu]
 amdgpu_gart_invalidate_tlb.part.0+0x59/0x90 [amdgpu]
 amdgpu_ttm_alloc_gart+0x146/0x180 [amdgpu]
 amdgpu_cs_parser_bos.isra.0+0x5d6/0x7d0 [amdgpu]
 amdgpu_cs_ioctl+0xbd0/0x1aa0 [amdgpu]
 drm_ioctl_kernel+0xa6/0x100
 drm_ioctl+0x262/0x520
 amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]

Error message: "amdgpu 0000:04:00.0: amdgpu: failed to write reg 1a6f4 wait reg 
1a706"

Full dmesg log available at: 
https://gitlab.freedesktop.org/-/project/4522/uploads/6a285ad2e24f4807e5d75c3f4ed5a7a1/dmesg-dump-stack.txt

Regarding the hibernation support issues you mentioned - I understand the 
limitations with secure boot and VRAM eviction. In my case, I have secure boot 
disabled and sufficient swap space, so the primary issue I'm encountering is 
this TLB flush failure.

I'm happy to test any patches or help with further debugging if needed.

Thanks,
Ionut

Reply via email to