Am 29.05.24 um 15:44 schrieb Li, Yunxiang (Teddy):
[AMD Official Use Only - AMD Internal Distribution Only]

I don't think trying to add some reset handling here makes sense in the first 
place.
Part of the reset/recovery procedure is to signal all fence and that includes 
the one we are waiting for here.
So this wait should return immediately in a reset anyway.
As far as I can tell, these fence_ptr s that get polled are not packaged into a 
fence obj, and in practice I see 10s of seconds wait before these timeout and 
reset can begin. Also after reset there is often a long wait, up to 2 minutes, 
for all the tlb_fence_work to timeout (not addressed by this patch, still 
haven't figure out what's going on there)

The problem is that we don't force complete the non scheduler rings, e.g. MES, KIQ etc...

Try to remove this check here from the loop in amdgpu_device_pre_asic_reset():

                if (!amdgpu_ring_sched_ready(ring))
                        continue;

Regards,
Christian.



Teddy

Reply via email to