[AMD Official Use Only - General] For the unlocking, I have tested on both nv21 and nv31, the unlock/lock paring looks not break.
On asic <gfx11, the (!adev->gfx.is_poweron) always true, this parameter is introduced from GFX11. On gfx11, in the reset (suspend then resume) process, after suspend, gfx poweron right after smu resumed successfully. The (!adev->gfx.is_poweron) is always false when trylock the reset_domin->sem. Not return ahead in gfx11 and continue gpu tlb flush in IP specific (gmc v11) callback. Then unlock after gpu tlb flush: void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid, uint32_t vmhub, uint32_t flush_type) { if (!down_read_trylock(&adev->reset_domain->sem) && !adev->gfx.is_poweron) //lock return; ... adev->gmc.gmc_funcs->flush_gpu_tlb(adev, vmid, vmhub, flush_type); up_read(&adev->reset_domain->sem); //unlock return; } Thanks, Feifei -----Original Message----- From: Xu, Feifei Sent: Tuesday, October 10, 2023 5:44 PM To: Christian König <ckoenig.leichtzumer...@gmail.com>; Wang, Yang(Kevin) <kevinyang.w...@amd.com>; amd-gfx@lists.freedesktop.org Cc: Koenig, Christian <christian.koe...@amd.com>; Zhang, Hawking <hawking.zh...@amd.com> Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb >> Then a TLB flush shouldn't be necessary on reset. A reset implies that the >> TLB is cleared as well. Hmm, in current implementation, when we say a reset implied that the TLB is cleared, assume that the TLB clear is purely hardware action. There's no gpu tlb flush initiated by software/driver after suspend. While in some asics of gfx11 (like nv31), gpu tlb need to be flushed by software/driver after smu resume successfully intentionally. Without the gpu tlb flush on nv31, S3 or reset will be break with gfx page fault. >> First of all the patch is broken because you only handle the locking, but >> not the unlocking part. For the unlocking part, realized that you and Kevin are correct. Lock/unlock not paried. Thanks, Feifei -----Original Message----- From: Christian König <ckoenig.leichtzumer...@gmail.com> Sent: Monday, October 9, 2023 4:52 PM To: Xu, Feifei <feifei...@amd.com>; Wang, Yang(Kevin) <kevinyang.w...@amd.com>; amd-gfx@lists.freedesktop.org Cc: Koenig, Christian <christian.koe...@amd.com>; Zhang, Hawking <hawking.zh...@amd.com> Subject: Re: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb Am 09.10.23 um 03:50 schrieb Xu, Feifei: > [AMD Official Use Only - General] > > Hi, > >>> Based on your description, the above code should use "||" instead of >>> "&&", > && is to add more restriction here. To avoid skipping necessary TLB flush by > return. > For Asics < GFX11, !adev->gfx.is_poweron is always true (this > paremeter is intrudoced from GFX11), only depends on reset_domain->sem; For > Asics = GFX11, !adev->gfx.is_poweron might be false (which gfx might already > poweron in the reset), this will make the if () not ture, return will not be > executed, thus flush TLB. First of all the patch is broken because you only handle the locking, but not the unlocking part. Then a TLB flush shouldn't be necessary on reset. A reset implies that the TLB is cleared as well. We discussed the possibility to avoid that, but this is not supposed to be happening at the moment. Regards, Christian. > >>> And after merging code into one line may result in the lock not being >>> released if the lock can be acquired success. > If !adev->gfx.is_poweron is true, the reset_domin->sem will not be > down_read_trylock, thus could avoid this deadlock. > > Thanks, > Feifei > > -----Original Message----- > From: Wang, Yang(Kevin) <kevinyang.w...@amd.com> > Sent: Sunday, October 8, 2023 9:36 PM > To: Xu, Feifei <feifei...@amd.com>; amd-gfx@lists.freedesktop.org > Cc: Xu, Feifei <feifei...@amd.com>; Xu, Feifei <feifei...@amd.com>; > Koenig, Christian <christian.koe...@amd.com>; Zhang, Hawking > <hawking.zh...@amd.com> > Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip > flush_gpu_tlb > > > -----Original Message----- > From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of > Feifei Xu > Sent: Sunday, October 8, 2023 6:07 PM > To: amd-gfx@lists.freedesktop.org > Cc: Xu, Feifei <feifei...@amd.com>; Xu, Feifei <feifei...@amd.com>; > Koenig, Christian <christian.koe...@amd.com>; Zhang, Hawking > <hawking.zh...@amd.com> > Subject: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb > > To fix the gpu recovery failed on GFX11 with gfxhub pagefault, flush gpu tlb > after reset on GFX11. > Gfxhub tlb flush need check if adev->gfx.is_poweron set. > > Fixes: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb > v2") > > Signed-off-by: Feifei Xu <feifei...@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > index 2f9bb86edd71..048d32edee88 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > @@ -611,8 +611,9 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, > uint32_t vmid, > /* > * A GPU reset should flush all TLBs anyway, so no need to do > * this while one is ongoing. > + * For GFX11, gfxhub flush check if adev->gfx.is_poweron is > set. > */ > - if (!down_read_trylock(&adev->reset_domain->sem)) > + if (!down_read_trylock(&adev->reset_domain->sem) && > +!adev->gfx.is_poweron) > return; > > [Kevin]: > Based on your description, the above code should use "||" instead of > "&&", And after merging code into one line may result in the lock not being > released if the lock can be acquired success. > > Best Regards, > Kevin > > if (adev->gmc.flush_tlb_needs_extra_type_2) > -- > 2.34.1 >