Hi Alex, Thank you for the detailed review and for pointing out the ordering issue.
You're absolutely right - I misunderstood the call sequence. Setting resume_gpu_stable to false in amdgpu_device_resume() happens after gfx_v9_0_cp_resume(), which defeats the purpose and permanently disables the KIQ path. However, I'm still experiencing the TLB flush failures after hibernation resume on AMD Cezanne (Renoir): amdgpu: TLB flush failed for PASID xxxxx amdgpu: failed to write reg 28b4 wait reg 28c6 amdgpu: failed to write reg 1a6f4 wait reg 1a706 If kiq sched.ready is being handled correctly as you described, what else could cause these failures during resume? Are there any known issues with KIQ-based TLB invalidation after hibernation on GFX9? Should I investigate: - Timing issues with KIQ command submission during early resume? - Power/clock gating states affecting KIQ functionality? - Missing synchronization after KIQ initialization? Any guidance on the correct direction to investigate would be appreciated. Thanks, Ionut
