RE: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-25 Thread Li, Yunxiang (Teddy)
[Public] > Looks like that is handled by the scheduler work item now as well. See > function gfx_v9_0_fault() for an example. Cool so it is blocked by drm_sched_stop also. I think that covers everything.

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-25 Thread Christian König
Am 24.04.24 um 15:13 schrieb Li, Yunxiang (Teddy): [Public] We have the KFD, FLR, the per engine one in the scheduler and IIRC one more for the CP (illegal operation and register write). I'm not sure about the CP one, but all others should be handled correctly with the V2 patch as far as I

RE: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-24 Thread Li, Yunxiang (Teddy)
[Public] > We have the KFD, FLR, the per engine one in the scheduler and IIRC one more > for the CP (illegal operation and register write). > > I'm not sure about the CP one, but all others should be handled correctly > with the V2 patch as far as I can see. Where can I find the CP one?

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-24 Thread Christian König
Am 23.04.24 um 20:05 schrieb Felix Kuehling: On 2024-04-23 01:50, Christian König wrote: Am 22.04.24 um 21:45 schrieb Yunxiang Li: Reset request from KFD is missing a check for if a reset is already in progress, this causes a second reset to be triggered right after the previous one finishes.

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-23 Thread Felix Kuehling
On 2024-04-23 01:50, Christian König wrote: Am 22.04.24 um 21:45 schrieb Yunxiang Li: Reset request from KFD is missing a check for if a reset is already in progress, this causes a second reset to be triggered right after the previous one finishes. Add the check to align with the other reset

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-22 Thread Christian König
Am 23.04.24 um 05:13 schrieb Li, Yunxiang (Teddy): [Public] We can't do this technically as there are cases where we skip full device reset (even then amdgpu_in_reset will return true). The better thing to do is to move amdgpu_device_stop_pending_resets() later in gpu_recover()- if a device

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-22 Thread Christian König
Am 22.04.24 um 21:45 schrieb Yunxiang Li: Reset request from KFD is missing a check for if a reset is already in progress, this causes a second reset to be triggered right after the previous one finishes. Add the check to align with the other reset sources. NAK, that isn't how this should be

RE: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-22 Thread Li, Yunxiang (Teddy)
[Public] > We can't do this technically as there are cases where we skip full device > reset (even then amdgpu_in_reset will return true). The better thing to do is > to move amdgpu_device_stop_pending_resets() later in > gpu_recover()- if a device has undergone full reset, then cancel all

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-22 Thread Lazar, Lijo
On 4/23/2024 1:15 AM, Yunxiang Li wrote: > Reset request from KFD is missing a check for if a reset is already in > progress, this causes a second reset to be triggered right after the > previous one finishes. Add the check to align with the other reset sources. > > Signed-off-by: Yunxiang Li

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-22 Thread Felix Kuehling
On 2024-04-22 16:14, Alex Deucher wrote: On Mon, Apr 22, 2024 at 3:52 PM Yunxiang Li wrote: Reset request from KFD is missing a check for if a reset is already in progress, this causes a second reset to be triggered right after the previous one finishes. Add the check to align with the other

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-22 Thread Alex Deucher
On Mon, Apr 22, 2024 at 3:52 PM Yunxiang Li wrote: > > Reset request from KFD is missing a check for if a reset is already in > progress, this causes a second reset to be triggered right after the > previous one finishes. Add the check to align with the other reset sources. Acked-by: Alex