Hi Alex,

Thank you for the detailed review and for pointing out the ordering issue.

You're absolutely right - I misunderstood the call sequence. Setting
resume_gpu_stable to false in amdgpu_device_resume() happens after
gfx_v9_0_cp_resume(), which defeats the purpose and permanently
disables the KIQ path.

However, I'm still experiencing the TLB flush failures after hibernation
resume on AMD Cezanne (Renoir):

  amdgpu: TLB flush failed for PASID xxxxx
  amdgpu: failed to write reg 28b4 wait reg 28c6
  amdgpu: failed to write reg 1a6f4 wait reg 1a706

If kiq sched.ready is being handled correctly as you described, what
else could cause these failures during resume? Are there any known
issues with KIQ-based TLB invalidation after hibernation on GFX9?

Should I investigate:
- Timing issues with KIQ command submission during early resume?
- Power/clock gating states affecting KIQ functionality?
- Missing synchronization after KIQ initialization?

Any guidance on the correct direction to investigate would be appreciated.

Thanks,
Ionut

Reply via email to