On Mon, Dec 29, 2025 at 4:56 AM Jesse.Zhang <[email protected]> wrote:
>
> In certain error scenarios (e.g., malformed commands), a fence may never 
> become signaled, causing the kernel to hang indefinitely when waiting with 
> MAX_SCHEDULE_TIMEOUT.
> To prevent such hangs and ensure system responsiveness, replace the 
> indefinite timeout with a reasonable 2-second limit.
>
> This ensures that even if a fence never signals, the wait will time out and 
> appropriate error handling can take place,
> rather than stalling the driver indefinitely.
>
> Signed-off-by: Jesse Zhang <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index 98110f543307..c28332f98aad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -371,7 +371,7 @@ static int amdgpu_userq_wait_for_last_fence(struct 
> amdgpu_usermode_queue *queue)
>         int ret = 0;
>
>         if (f && !dma_fence_is_signaled(f)) {
> -               ret = dma_fence_wait_timeout(f, true, MAX_SCHEDULE_TIMEOUT);
> +               ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(2000));

use adev->gfx_timeout to be consistent with kernel queues.  You could
even make it per queue type (adev->gfx_timeout vs
adev->compute_timeout vs adev->sdma_timeout).

Alex

>                 if (ret <= 0) {
>                         drm_file_err(uq_mgr->file, "Timed out waiting for 
> fence=%llu:%llu\n",
>                                      f->context, f->seqno);
> --
> 2.49.0
>

Reply via email to