On 3/18/26 08:47, Sunil Khatri wrote:
> This reverts commit 0cdff8eb31c139dde4716e4aa37198c16364629e.
>
> The patch has caused regression for userqueues where user is stuck
> and is waiting for fences and a gpu reset is triggered in kernel.
> Also for any of the parameters when count is zero, the driver does
> not read from the pointer and having that check is overkill.
>
> Application:
> MESA: error: amdgpu: getting wait num_fences failed
> MESA: error: amdgpu: getting wait fences failed
> MESA: error: amdgpu: getting wait num_fences failed
> MESA: error: amdgpu: getting wait fences failed
>
> Dmesg:
> [ 122.668493] amdgpu 0000:0a:00.0: sq_intr: error, detail 0x00000000, type
> 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0
> [ 122.668504] amdgpu 0000:0a:00.0: sq_intr: error, detail 0x00000000, type
> 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0
> [ 124.687518] amdgpu 0000:0a:00.0: Dumping IP State
> [ 124.688351] amdgpu 0000:0a:00.0: Dumping IP State Completed
> [ 124.688355] amdgpu 0000:0a:00.0: [drm] AMDGPU device coredump file has
> been created
> [ 124.688357] amdgpu 0000:0a:00.0: [drm] Check your
> /sys/class/drm/card0/device/devcoredump/data
> [ 124.688361] amdgpu 0000:0a:00.0: ring gfx_0.0.0 timeout, signaled seq=569,
> emitted seq=571
> [ 124.688366] amdgpu 0000:0a:00.0: Process Xwayland pid 3471 thread
> Xwayland:cs0 pid 3479
> [ 124.688369] amdgpu 0000:0a:00.0: Starting gfx_0.0.0 ring reset
> [ 126.560451] amdgpu 0000:0a:00.0: MES(0) failed to respond to msg=RESET
> [ 126.560456] amdgpu 0000:0a:00.0: failed to detect and reset
> [ 126.560460] amdgpu 0000:0a:00.0: Failed to detect and reset queues, err
> (-110)
> [ 128.789840] amdgpu 0000:0a:00.0: Ring gfx_0.0.0 reset failed
> [ 128.789848] amdgpu 0000:0a:00.0: GPU reset begin!. Source: 1
> [ 128.790161] amdgpu 0000:0a:00.0: Guilty job already signaled, skipping HW
> reset
> [ 128.790174] amdgpu 0000:0a:00.0: GPU reset(1) succeeded!
> [ 128.804538] amdgpu 0000:0a:00.0: [drm] device wedged, but recovered
> through reset
> [ 128.804574] amdgpu 0000:0a:00.0: GPU reset begin!. Source: 6
> [ 128.816663] amdgpu 0000:0a:00.0: Dumping IP State
> [ 128.817458] amdgpu 0000:0a:00.0: Dumping IP State Completed
> [ 130.963939] amdgpu 0000:0a:00.0: MES(1) failed to respond to
> msg=REMOVE_QUEUE
> [ 130.963949] amdgpu 0000:0a:00.0: failed to unmap legacy queue
>
> Cc: Jesse Zhang <[email protected]>
> Signed-off-by: Sunil Khatri <[email protected]>
> ---
> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 29 -------------------
> 1 file changed, 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 3fcd70a38374..0d9a13081f2f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -484,16 +484,6 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev,
> void *data,
> args->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES)
> return -EINVAL;
>
> - /* Reject non-NULL pointers paired with a zero count. */
> - if (!args->num_syncobj_handles && args->syncobj_handles)
> - return -EINVAL;
> -
> - if (!args->num_bo_read_handles && args->bo_read_handles)
> - return -EINVAL;
> -
> - if (!args->num_bo_write_handles && args->bo_write_handles)
> - return -EINVAL;
> -
> num_syncobj_handles = args->num_syncobj_handles;
> syncobj_handles =
> memdup_array_user(u64_to_user_ptr(args->syncobj_handles),
> num_syncobj_handles, sizeof(u32));
> @@ -950,25 +940,6 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void
> *data,
> wait_info->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES)
> return -EINVAL;
>
> - /* Reject non-NULL pointers paired with a zero count: the pointer
> - * is meaningless and indicates inconsistent input from userspace.
> - */
> - if (!wait_info->num_syncobj_handles && wait_info->syncobj_handles)
> - return -EINVAL;
> -
> - if (!wait_info->num_syncobj_timeline_handles &&
> - (wait_info->syncobj_timeline_handles ||
> wait_info->syncobj_timeline_points))
> - return -EINVAL;
> -
> - if (!wait_info->num_bo_read_handles && wait_info->bo_read_handles)
> - return -EINVAL;
> -
> - if (!wait_info->num_bo_write_handles && wait_info->bo_write_handles)
> - return -EINVAL;
> -
> - if (!wait_info->num_fences && wait_info->out_fences)
> - return -EINVAL;
> -
Mhm, in general such checks look valid to me.
My educated guess is that userspace sets num_fences = 0 to query if it needs to
resize the pointer out_fences or not.
If you have time please double check which check fails here.
Apart from that Reviewed-by: Christian König <[email protected]>.
Regards,
Christian.
> num_syncobj = wait_info->num_syncobj_handles;
> ptr = u64_to_user_ptr(wait_info->syncobj_handles);
> syncobj_handles = memdup_array_user(ptr, num_syncobj, sizeof(u32));