[AMD Official Use Only - AMD Internal Distribution Only] Hi, Christian.
what I mean is: in sriov mode, when customer need limit timeout value , they should set the "lockup_timeout" according to the vf number they load. Why: The real timeout value in sriov for each vf is " lockup_timeout / vf_number", As you said they want to limit the timeout to 2s, when customer load 8vf, they should set the "lockup_timeout" to 16s, 4vf need set "lockup_timeout" to 8s. (After we test, when value "lockup_timeout" set to 2s, the 4vf mode can't work as each vf only get 0.5s) Thanks, Chong. -----Original Message----- From: Koenig, Christian <[email protected]> Sent: Tuesday, November 18, 2025 5:31 PM To: Li, Chong(Alan) <[email protected]>; [email protected] Cc: Chen, Horace <[email protected]> Subject: Re: [PATCH] drm/amdgpu: in sriov multiple vf mode, 2 seconds timeout is not enough for sdma job Hi Chong, that is not a valid justification. What customer requirement is causing this SDMA timeout? When it is just your CI system then change the CI system. As long as you can't come up with a customer requirement this change is rejected. Regards, Christian. On 11/18/25 10:29, Li, Chong(Alan) wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > Hi, Christian. > > In multiple vf mode( in our CI environment the vf number is 4), the timeout > value is shared across all vfs. > After timeout value change to 2s, each vf only get 0.5s, cause sdma ring > timeout and trigger gpu reset. > > > Thanks, > Chong. > > -----Original Message----- > From: Koenig, Christian <[email protected]> > Sent: Tuesday, November 18, 2025 4:34 PM > To: Li, Chong(Alan) <[email protected]>; [email protected] > Subject: Re: [PATCH] drm/amdgpu: in sriov multiple vf mode, 2 seconds timeout > is not enough for sdma job > > Clear NAK to this patch. > > It is explicitely requested by customers that we only have a 2 second timeout. > > So you need a very good explanation to have that changed for SRIOV. > > Regards, > Christian. > > On 11/17/25 07:53, chong li wrote: >> Signed-off-by: chong li <[email protected]> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++-- >> 2 files changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 69c29f47212d..4ab755eb5ec1 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -4341,10 +4341,15 @@ static int >> amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev) >> int index = 0; >> long timeout; >> int ret = 0; >> + long timeout_default; >> >> - /* By default timeout for all queues is 2 sec */ >> + if (amdgpu_sriov_vf(adev)) >> + timeout_default = msecs_to_jiffies(10000); >> + else >> + timeout_default = msecs_to_jiffies(2000); >> + /* By default timeout for all queues is 10 sec in sriov, 2 sec not in >> sriov*/ >> adev->gfx_timeout = adev->compute_timeout = adev->sdma_timeout = >> - adev->video_timeout = msecs_to_jiffies(2000); >> + adev->video_timeout = timeout_default; >> >> if (!strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) >> return 0; >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> index f508c1a9fa2c..43bdd6c1bec2 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> @@ -358,10 +358,10 @@ module_param_named(svm_default_granularity, >> amdgpu_svm_default_granularity, uint >> * [GFX,Compute,SDMA,Video] to set individual timeouts. >> * Negative values mean infinity. >> * >> - * By default(with no lockup_timeout settings), the timeout for all queues >> is 2000. >> + * By default(with no lockup_timeout settings), the timeout for all queues >> is 10000 in sriov, 2000 not in sriov. >> */ >> MODULE_PARM_DESC(lockup_timeout, >> - "GPU lockup timeout in ms (default: 2000. 0: keep default >> value. negative: infinity timeout), format: [single value for all] or >> [GFX,Compute,SDMA,Video]."); >> + "GPU lockup timeout in ms (default: 10000 in sriov, 2000 not >> in sriov. 0: keep default value. negative: infinity timeout), format: >> [single value for all] or [GFX,Compute,SDMA,Video]."); >> module_param_string(lockup_timeout, amdgpu_lockup_timeout, >> sizeof(amdgpu_lockup_timeout), 0444); >> >
