RE: [PATCH] drm/amdgpu: Increase tlb flush timeout for sriov

Liu, Cheng Zhe Wed, 19 May 2021 04:09:04 -0700

[AMD Official Use Only]

We support 12 VF at most. In worst case, the first 11 all IDLE fail and do FLR, 
it will need 11 * 500ms to switch to the 12nd VF,
so I set 12 * 500ms  for the timeout.


-----Original Message-----
From: Christian König <[email protected]> 
Sent: Wednesday, May 19, 2021 6:08 PM
To: Liu, Cheng Zhe <[email protected]>; [email protected]
Cc: Xiao, Jack <[email protected]>; Xu, Feifei <[email protected]>; Wang, 
Kevin(Yang) <[email protected]>; Tuikov, Luben <[email protected]>; 
Deucher, Alexander <[email protected]>; Koenig, Christian 
<[email protected]>; Zhang, Hawking <[email protected]>
Subject: Re: [PATCH] drm/amdgpu: Increase tlb flush timeout for sriov

Am 19.05.21 um 11:32 schrieb Chengzhe Liu:
> When there is 12 VF, we need to increase the timeout

NAK, 6 seconds is way to long to wait polling on a fence.

Why should an invalidation take that long? The engine are per VF just to avoid 
exactly that problem.

Christian.

>
> Signed-off-by: Chengzhe Liu <[email protected]>
> ---
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 6 +++++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 6 +++++-
>   2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index f02dc904e4cf..a5f005c5d0ec 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -404,6 +404,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
>       uint32_t seq;
>       uint16_t queried_pasid;
>       bool ret;
> +     uint32_t sriov_usec_timeout = 6000000;  /* wait for 12 * 500ms for 
> +SRIOV */
>       struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
>       struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   
> @@ -422,7 +423,10 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
>   
>               amdgpu_ring_commit(ring);
>               spin_unlock(&adev->gfx.kiq.ring_lock);
> -             r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
> +             if (amdgpu_sriov_vf(adev))
> +                     r = amdgpu_fence_wait_polling(ring, seq, 
> sriov_usec_timeout);
> +             else
> +                     r = amdgpu_fence_wait_polling(ring, seq, 
> adev->usec_timeout);
>               if (r < 1) {
>                       dev_err(adev->dev, "wait for kiq fence error: %ld.\n", 
> r);
>                       return -ETIME;
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index ceb3968d8326..e4a18d8f75c2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -857,6 +857,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
>       uint32_t seq;
>       uint16_t queried_pasid;
>       bool ret;
> +     uint32_t sriov_usec_timeout = 6000000;  /* wait for 12 * 500ms for 
> +SRIOV */
>       struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
>       struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   
> @@ -896,7 +897,10 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
>   
>               amdgpu_ring_commit(ring);
>               spin_unlock(&adev->gfx.kiq.ring_lock);
> -             r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
> +             if (amdgpu_sriov_vf(adev))
> +                     r = amdgpu_fence_wait_polling(ring, seq, 
> sriov_usec_timeout);
> +             else
> +                     r = amdgpu_fence_wait_polling(ring, seq, 
> adev->usec_timeout);
>               if (r < 1) {
>                       dev_err(adev->dev, "wait for kiq fence error: %ld.\n", 
> r);
>                       up_read(&adev->reset_sem);
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Increase tlb flush timeout for sriov

Reply via email to