On 3/18/26 08:47, Sunil Khatri wrote:
> This reverts commit 0cdff8eb31c139dde4716e4aa37198c16364629e.
> 
> The patch has caused regression for userqueues where user is stuck
> and is waiting for fences and a gpu reset is triggered in kernel.
> Also for any of the parameters when count is zero, the driver does
> not read from the pointer and having that check is overkill.
> 
> Application:
> MESA: error: amdgpu: getting wait num_fences failed
> MESA: error: amdgpu: getting wait fences failed
> MESA: error: amdgpu: getting wait num_fences failed
> MESA: error: amdgpu: getting wait fences failed
> 
> Dmesg:
> [  122.668493] amdgpu 0000:0a:00.0: sq_intr: error, detail 0x00000000, type 
> 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0
> [  122.668504] amdgpu 0000:0a:00.0: sq_intr: error, detail 0x00000000, type 
> 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0
> [  124.687518] amdgpu 0000:0a:00.0: Dumping IP State
> [  124.688351] amdgpu 0000:0a:00.0: Dumping IP State Completed
> [  124.688355] amdgpu 0000:0a:00.0: [drm] AMDGPU device coredump file has 
> been created
> [  124.688357] amdgpu 0000:0a:00.0: [drm] Check your 
> /sys/class/drm/card0/device/devcoredump/data
> [  124.688361] amdgpu 0000:0a:00.0: ring gfx_0.0.0 timeout, signaled seq=569, 
> emitted seq=571
> [  124.688366] amdgpu 0000:0a:00.0:  Process Xwayland pid 3471 thread 
> Xwayland:cs0 pid 3479
> [  124.688369] amdgpu 0000:0a:00.0: Starting gfx_0.0.0 ring reset
> [  126.560451] amdgpu 0000:0a:00.0: MES(0) failed to respond to msg=RESET
> [  126.560456] amdgpu 0000:0a:00.0: failed to detect and reset
> [  126.560460] amdgpu 0000:0a:00.0: Failed to detect and reset queues, err 
> (-110)
> [  128.789840] amdgpu 0000:0a:00.0: Ring gfx_0.0.0 reset failed
> [  128.789848] amdgpu 0000:0a:00.0: GPU reset begin!. Source:  1
> [  128.790161] amdgpu 0000:0a:00.0: Guilty job already signaled, skipping HW 
> reset
> [  128.790174] amdgpu 0000:0a:00.0: GPU reset(1) succeeded!
> [  128.804538] amdgpu 0000:0a:00.0: [drm] device wedged, but recovered 
> through reset
> [  128.804574] amdgpu 0000:0a:00.0: GPU reset begin!. Source:  6
> [  128.816663] amdgpu 0000:0a:00.0: Dumping IP State
> [  128.817458] amdgpu 0000:0a:00.0: Dumping IP State Completed
> [  130.963939] amdgpu 0000:0a:00.0: MES(1) failed to respond to 
> msg=REMOVE_QUEUE
> [  130.963949] amdgpu 0000:0a:00.0: failed to unmap legacy queue
> 
> Cc: Jesse Zhang <[email protected]>
> Signed-off-by: Sunil Khatri <[email protected]>
> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 29 -------------------
>  1 file changed, 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 3fcd70a38374..0d9a13081f2f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -484,16 +484,6 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, 
> void *data,
>           args->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES)
>               return -EINVAL;
>  
> -     /* Reject non-NULL pointers paired with a zero count. */
> -     if (!args->num_syncobj_handles && args->syncobj_handles)
> -             return -EINVAL;
> -
> -     if (!args->num_bo_read_handles && args->bo_read_handles)
> -             return -EINVAL;
> -
> -     if (!args->num_bo_write_handles && args->bo_write_handles)
> -             return -EINVAL;
> -
>       num_syncobj_handles = args->num_syncobj_handles;
>       syncobj_handles = 
> memdup_array_user(u64_to_user_ptr(args->syncobj_handles),
>                                           num_syncobj_handles, sizeof(u32));
> @@ -950,25 +940,6 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void 
> *data,
>           wait_info->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES)
>               return -EINVAL;
>  
> -     /* Reject non-NULL pointers paired with a zero count: the pointer
> -      * is meaningless and indicates inconsistent input from userspace.
> -      */
> -     if (!wait_info->num_syncobj_handles && wait_info->syncobj_handles)
> -             return -EINVAL;
> -
> -     if (!wait_info->num_syncobj_timeline_handles &&
> -         (wait_info->syncobj_timeline_handles || 
> wait_info->syncobj_timeline_points))
> -             return -EINVAL;
> -
> -     if (!wait_info->num_bo_read_handles && wait_info->bo_read_handles)
> -             return -EINVAL;
> -
> -     if (!wait_info->num_bo_write_handles && wait_info->bo_write_handles)
> -             return -EINVAL;
> -
> -     if (!wait_info->num_fences && wait_info->out_fences)
> -             return -EINVAL;
> -

Mhm, in general such checks look valid to me.

My educated guess is that userspace sets num_fences = 0 to query if it needs to 
resize the pointer out_fences or not.

If you have time please double check which check fails here.

Apart from that Reviewed-by: Christian König <[email protected]>.

Regards,
Christian.

>       num_syncobj = wait_info->num_syncobj_handles;
>       ptr = u64_to_user_ptr(wait_info->syncobj_handles);
>       syncobj_handles = memdup_array_user(ptr, num_syncobj, sizeof(u32));

Reply via email to