On 1/9/26 03:34, Jesse.Zhang wrote:
> In certain error scenarios (e.g., malformed commands), a fence may never 
> become signaled, causing the kernel to hang indefinitely when waiting with 
> MAX_SCHEDULE_TIMEOUT.
> To prevent such hangs and ensure system responsiveness, replace the 
> indefinite timeout with a reasonable 2-second limit.
> 
> This ensures that even if a fence never signals, the wait will time out and 
> appropriate error handling can take place,
> rather than stalling the driver indefinitely.
> 
> v2: make timeout per queue type (adev->gfx_timeout vs adev->compute_timeout 
> vs adev->sdma_timeout) to be consistent with kernel queues. (Alex)
> 
> Suggested-by: Alex Deucher <[email protected]>
> Signed-off-by: Jesse Zhang <[email protected]>

Once more: Absolutely clear NAK to this here!

The function amdgpu_userq_wait_for_last_fence() *MUST* wait forever for the 
last fence to signal otherwise we run into massive problems.

Regards,
Christian.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index 98110f543307..402307581293 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -367,11 +367,29 @@ static int amdgpu_userq_map_helper(struct 
> amdgpu_usermode_queue *queue)
>  static int amdgpu_userq_wait_for_last_fence(struct amdgpu_usermode_queue 
> *queue)
>  {
>       struct amdgpu_userq_mgr *uq_mgr = queue->userq_mgr;
> +     struct amdgpu_device *adev = uq_mgr->adev;
>       struct dma_fence *f = queue->last_fence;
> +     signed long timeout;
>       int ret = 0;
>  
> +     /* Determine timeout based on queue type */
> +     switch (queue->queue_type) {
> +     case AMDGPU_RING_TYPE_GFX:
> +             timeout = adev->gfx_timeout;
> +             break;
> +     case AMDGPU_RING_TYPE_COMPUTE:
> +             timeout = adev->compute_timeout;
> +             break;
> +     case AMDGPU_RING_TYPE_SDMA:
> +             timeout = adev->sdma_timeout;
> +             break;
> +     default:
> +             timeout = adev->gfx_timeout;
> +             break;
> +     }
> +
>       if (f && !dma_fence_is_signaled(f)) {
> -             ret = dma_fence_wait_timeout(f, true, MAX_SCHEDULE_TIMEOUT);
> +             ret = dma_fence_wait_timeout(f, true, timeout);
>               if (ret <= 0) {
>                       drm_file_err(uq_mgr->file, "Timed out waiting for 
> fence=%llu:%llu\n",
>                                    f->context, f->seqno);

Reply via email to