Hi Chong, yeah and exactly that argumentation is not correct.
We have to guarantee a min minimum response time and that is what the timeout is all about. And it doesn't matter if the available HW time is split between 1,2,4 or 8 virtual functions. The minimum reponse time we need to guarantee is still the same, it's just that the available HW time is lowered. So as long as we don't have an explicit customer request which asks for longer default timeouts this change is rejected. Regards, Christian. On 11/18/25 11:08, Li, Chong(Alan) wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > Hi, Christian. > > what I mean is: > in sriov mode, when customer need limit timeout value , > they should set the "lockup_timeout" according to the vf number they load. > > > Why: > > The real timeout value in sriov for each vf is " lockup_timeout / vf_number", > > As you said they want to limit the timeout to 2s, > when customer load 8vf, they should set the "lockup_timeout" to 16s, 4vf > need set "lockup_timeout" to 8s. > > > (After we test, when value "lockup_timeout" set to 2s, the 4vf mode can't > work as each vf only get 0.5s) > > > > > > Thanks, > Chong. > > > > -----Original Message----- > From: Koenig, Christian <[email protected]> > Sent: Tuesday, November 18, 2025 5:31 PM > To: Li, Chong(Alan) <[email protected]>; [email protected] > Cc: Chen, Horace <[email protected]> > Subject: Re: [PATCH] drm/amdgpu: in sriov multiple vf mode, 2 seconds timeout > is not enough for sdma job > > Hi Chong, > > that is not a valid justification. > > What customer requirement is causing this SDMA timeout? > > When it is just your CI system then change the CI system. > > As long as you can't come up with a customer requirement this change is > rejected. > > Regards, > Christian. > > On 11/18/25 10:29, Li, Chong(Alan) wrote: >> [AMD Official Use Only - AMD Internal Distribution Only] >> >> Hi, Christian. >> >> In multiple vf mode( in our CI environment the vf number is 4), the timeout >> value is shared across all vfs. >> After timeout value change to 2s, each vf only get 0.5s, cause sdma ring >> timeout and trigger gpu reset. >> >> >> Thanks, >> Chong. >> >> -----Original Message----- >> From: Koenig, Christian <[email protected]> >> Sent: Tuesday, November 18, 2025 4:34 PM >> To: Li, Chong(Alan) <[email protected]>; [email protected] >> Subject: Re: [PATCH] drm/amdgpu: in sriov multiple vf mode, 2 seconds >> timeout is not enough for sdma job >> >> Clear NAK to this patch. >> >> It is explicitely requested by customers that we only have a 2 second >> timeout. >> >> So you need a very good explanation to have that changed for SRIOV. >> >> Regards, >> Christian. >> >> On 11/17/25 07:53, chong li wrote: >>> Signed-off-by: chong li <[email protected]> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++-- >>> 2 files changed, 9 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index 69c29f47212d..4ab755eb5ec1 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -4341,10 +4341,15 @@ static int >>> amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev) >>> int index = 0; >>> long timeout; >>> int ret = 0; >>> + long timeout_default; >>> >>> - /* By default timeout for all queues is 2 sec */ >>> + if (amdgpu_sriov_vf(adev)) >>> + timeout_default = msecs_to_jiffies(10000); >>> + else >>> + timeout_default = msecs_to_jiffies(2000); >>> + /* By default timeout for all queues is 10 sec in sriov, 2 sec not in >>> sriov*/ >>> adev->gfx_timeout = adev->compute_timeout = adev->sdma_timeout = >>> - adev->video_timeout = msecs_to_jiffies(2000); >>> + adev->video_timeout = timeout_default; >>> >>> if (!strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) >>> return 0; >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> index f508c1a9fa2c..43bdd6c1bec2 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> @@ -358,10 +358,10 @@ module_param_named(svm_default_granularity, >>> amdgpu_svm_default_granularity, uint >>> * [GFX,Compute,SDMA,Video] to set individual timeouts. >>> * Negative values mean infinity. >>> * >>> - * By default(with no lockup_timeout settings), the timeout for all queues >>> is 2000. >>> + * By default(with no lockup_timeout settings), the timeout for all queues >>> is 10000 in sriov, 2000 not in sriov. >>> */ >>> MODULE_PARM_DESC(lockup_timeout, >>> - "GPU lockup timeout in ms (default: 2000. 0: keep default >>> value. negative: infinity timeout), format: [single value for all] or >>> [GFX,Compute,SDMA,Video]."); >>> + "GPU lockup timeout in ms (default: 10000 in sriov, 2000 not >>> in sriov. 0: keep default value. negative: infinity timeout), format: >>> [single value for all] or [GFX,Compute,SDMA,Video]."); >>> module_param_string(lockup_timeout, amdgpu_lockup_timeout, >>> sizeof(amdgpu_lockup_timeout), 0444); >>> >> >
