Hi Ionut,

On 2/1/26 20:05, Ionut Nechita (Sunlight Linux) wrote:
> Hi Alex,
> 
> Thank you for the quick response and for the information about hibernation 
> support.
> 
> Here's the stack trace showing the call chain when the TLB flush failures 
> occur. The issue happens in two places:
> 
> 1. During resume (hibernation restore):
> 
> Call Trace:
>  dump_stack_lvl+0x5b/0x80
>  amdgpu_gmc_fw_reg_write_reg_wait+0x1c7/0x1d0 [amdgpu]
>  gmc_v9_0_hw_init+0x2e2/0x390 [amdgpu]
>  gmc_v9_0_resume+0x26/0x70 [amdgpu]
>  amdgpu_ip_block_resume+0x27/0x50 [amdgpu]
>  amdgpu_device_ip_resume_phase1+0x55/0x90 [amdgpu]
>  amdgpu_device_resume+0x69/0x380 [amdgpu]
>  amdgpu_pmops_resume+0x46/0x80 [amdgpu]
>  dpm_run_callback+0x4a/0x150
>  device_resume+0x1df/0x2f0
>  async_resume+0x21/0x30
>  async_run_entry_fn+0x36/0x160
>  process_one_work+0x193/0x350
>  worker_thread+0x2d7/0x410
> 
> 2. Subsequent failures during command submission:
> 
> Call Trace:
>  dump_stack_lvl+0x5b/0x80
>  amdgpu_gmc_fw_reg_write_reg_wait+0x1c7/0x1d0 [amdgpu]
>  amdgpu_gmc_flush_gpu_tlb+0xd0/0x280 [amdgpu]
>  amdgpu_gart_invalidate_tlb.part.0+0x59/0x90 [amdgpu]
>  amdgpu_ttm_alloc_gart+0x146/0x180 [amdgpu]
>  amdgpu_cs_parser_bos.isra.0+0x5d6/0x7d0 [amdgpu]
>  amdgpu_cs_ioctl+0xbd0/0x1aa0 [amdgpu]
>  drm_ioctl_kernel+0xa6/0x100
>  drm_ioctl+0x262/0x520
>  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> 
> Error message: "amdgpu 0000:04:00.0: amdgpu: failed to write reg 1a6f4 wait 
> reg 1a706"

well thanks for doing this, but that unfortunately doesn't help us solve this. 
That result is perfectly expected.

Can you check if the KIQ ring test is executed correctly after hibernation? 
E.g. what happens when amdgpu_ring_test_ring(kiq_ring) is called? Is that 
called at all?

Thanks,
Christian.

> Full dmesg log available at: 
> https://gitlab.freedesktop.org/-/project/4522/uploads/6a285ad2e24f4807e5d75c3f4ed5a7a1/dmesg-dump-stack.txt
> 
> Regarding the hibernation support issues you mentioned - I understand the 
> limitations with secure boot and VRAM eviction. In my case, I have secure 
> boot disabled and sufficient swap space, so the primary issue I'm 
> encountering is this TLB flush failure.
> 
> I'm happy to test any patches or help with further debugging if needed.
> 
> Thanks,
> Ionut

Reply via email to