[AMD Official Use Only - AMD Internal Distribution Only]

Hi, Christian.

In multiple vf mode( in our CI environment the vf number is 4), the timeout 
value is shared across all vfs.
After timeout value change to 2s, each vf only get 0.5s, cause sdma ring 
timeout and trigger gpu reset.


Thanks,
Chong.

-----Original Message-----
From: Koenig, Christian <[email protected]>
Sent: Tuesday, November 18, 2025 4:34 PM
To: Li, Chong(Alan) <[email protected]>; [email protected]
Subject: Re: [PATCH] drm/amdgpu: in sriov multiple vf mode, 2 seconds timeout 
is not enough for sdma job

Clear NAK to this patch.

It is explicitely requested by customers that we only have a 2 second timeout.

So you need a very good explanation to have that changed for SRIOV.

Regards,
Christian.

On 11/17/25 07:53, chong li wrote:
> Signed-off-by: chong li <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 4 ++--
>  2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 69c29f47212d..4ab755eb5ec1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4341,10 +4341,15 @@ static int 
> amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>       int index = 0;
>       long timeout;
>       int ret = 0;
> +     long timeout_default;
>
> -     /* By default timeout for all queues is 2 sec */
> +     if (amdgpu_sriov_vf(adev))
> +             timeout_default = msecs_to_jiffies(10000);
> +     else
> +             timeout_default = msecs_to_jiffies(2000);
> +     /* By default timeout for all queues is 10 sec in sriov, 2 sec not in 
> sriov*/
>       adev->gfx_timeout = adev->compute_timeout = adev->sdma_timeout =
> -             adev->video_timeout = msecs_to_jiffies(2000);
> +             adev->video_timeout = timeout_default;
>
>       if (!strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH))
>               return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index f508c1a9fa2c..43bdd6c1bec2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -358,10 +358,10 @@ module_param_named(svm_default_granularity, 
> amdgpu_svm_default_granularity, uint
>   * [GFX,Compute,SDMA,Video] to set individual timeouts.
>   * Negative values mean infinity.
>   *
> - * By default(with no lockup_timeout settings), the timeout for all queues 
> is 2000.
> + * By default(with no lockup_timeout settings), the timeout for all queues 
> is 10000 in sriov, 2000 not in sriov.
>   */
>  MODULE_PARM_DESC(lockup_timeout,
> -              "GPU lockup timeout in ms (default: 2000. 0: keep default 
> value. negative: infinity timeout), format: [single value for all] or 
> [GFX,Compute,SDMA,Video].");
> +              "GPU lockup timeout in ms (default: 10000 in sriov, 2000 not 
> in sriov. 0: keep default value. negative: infinity timeout), format: [single 
> value for all] or [GFX,Compute,SDMA,Video].");
>  module_param_string(lockup_timeout, amdgpu_lockup_timeout,
>                   sizeof(amdgpu_lockup_timeout), 0444);
>

Reply via email to