WAIT ioctl argument validation"

Zhang, Jesse(Jie) Wed, 18 Mar 2026 02:32:40 -0700

[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Khatri, Sunil <[email protected]>
> Sent: Wednesday, March 18, 2026 4:22 PM
> To: Koenig, Christian <[email protected]>; Khatri, Sunil
> <[email protected]>; Deucher, Alexander <[email protected]>
> Cc: [email protected]; Zhang, Jesse(Jie) <[email protected]>
> Subject: Re: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl argument
> validation"
>
>
> On 18-03-2026 01:29 pm, Christian König wrote:
> >
> > On 3/18/26 08:47, Sunil Khatri wrote:
> >> This reverts commit 0cdff8eb31c139dde4716e4aa37198c16364629e.
> >>
> >> The patch has caused regression for userqueues where user is stuck
> >> and is waiting for fences and a gpu reset is triggered in kernel.
> >> Also for any of the parameters when count is zero, the driver does
> >> not read from the pointer and having that check is overkill.
> >>
> >> Application:
> >> MESA: error: amdgpu: getting wait num_fences failed
> >> MESA: error: amdgpu: getting wait fences failed
> >> MESA: error: amdgpu: getting wait num_fences failed
> >> MESA: error: amdgpu: getting wait fences failed
After I reverted this patch, the error still occurs when running glxgears.
Does it work fine on your end if you don't apply this patch?


amdgpu: getting wait fences failed
amdgpu: getting wait fences failed
amdgpu: getting wait fences failed

Thanks
Jesse


> >>
> >> Dmesg:
> >> [  122.668493] amdgpu 0000:0a:00.0: sq_intr: error, detail
> >> 0x00000000, type 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0 [
> >> 122.668504] amdgpu 0000:0a:00.0: sq_intr: error, detail 0x00000000,
> >> type 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0 [  124.687518]
> >> amdgpu 0000:0a:00.0: Dumping IP State [  124.688351] amdgpu
> >> 0000:0a:00.0: Dumping IP State Completed [  124.688355] amdgpu
> >> 0000:0a:00.0: [drm] AMDGPU device coredump file has been created [
> >> 124.688357] amdgpu 0000:0a:00.0: [drm] Check your
> >> /sys/class/drm/card0/device/devcoredump/data
> >> [  124.688361] amdgpu 0000:0a:00.0: ring gfx_0.0.0 timeout, signaled
> >> seq=569, emitted seq=571 [  124.688366] amdgpu 0000:0a:00.0:  Process
> >> Xwayland pid 3471 thread Xwayland:cs0 pid 3479 [  124.688369] amdgpu
> >> 0000:0a:00.0: Starting gfx_0.0.0 ring reset [  126.560451] amdgpu
> >> 0000:0a:00.0: MES(0) failed to respond to msg=RESET [  126.560456]
> >> amdgpu 0000:0a:00.0: failed to detect and reset [  126.560460] amdgpu
> >> 0000:0a:00.0: Failed to detect and reset queues, err (-110) [
> >> 128.789840] amdgpu 0000:0a:00.0: Ring gfx_0.0.0 reset failed [
> >> 128.789848] amdgpu 0000:0a:00.0: GPU reset begin!. Source:  1 [
> >> 128.790161] amdgpu 0000:0a:00.0: Guilty job already signaled, skipping HW
> reset [  128.790174] amdgpu 0000:0a:00.0: GPU reset(1) succeeded!
> >> [  128.804538] amdgpu 0000:0a:00.0: [drm] device wedged, but
> >> recovered through reset [  128.804574] amdgpu 0000:0a:00.0: GPU reset
> >> begin!. Source:  6 [  128.816663] amdgpu 0000:0a:00.0: Dumping IP
> >> State [  128.817458] amdgpu 0000:0a:00.0: Dumping IP State Completed
> >> [  130.963939] amdgpu 0000:0a:00.0: MES(1) failed to respond to
> >> msg=REMOVE_QUEUE [  130.963949] amdgpu 0000:0a:00.0: failed to unmap
> >> legacy queue
> >>
> >> Cc: Jesse Zhang <[email protected]>
> >> Signed-off-by: Sunil Khatri <[email protected]>
> >> ---
> >>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 29 -------------------
> >>   1 file changed, 29 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> index 3fcd70a38374..0d9a13081f2f 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> @@ -484,16 +484,6 @@ int amdgpu_userq_signal_ioctl(struct drm_device
> *dev, void *data,
> >>        args->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES)
> >>            return -EINVAL;
> >>
> >> -  /* Reject non-NULL pointers paired with a zero count. */
> >> -  if (!args->num_syncobj_handles && args->syncobj_handles)
> >> -          return -EINVAL;
> >> -
> >> -  if (!args->num_bo_read_handles && args->bo_read_handles)
> >> -          return -EINVAL;
> >> -
> >> -  if (!args->num_bo_write_handles && args->bo_write_handles)
> >> -          return -EINVAL;
> >> -
> >>    num_syncobj_handles = args->num_syncobj_handles;
> >>    syncobj_handles = memdup_array_user(u64_to_user_ptr(args-
> >syncobj_handles),
> >>                                        num_syncobj_handles, sizeof(u32)); 
> >> @@ -
> 950,25 +940,6 @@
> >> int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> >>        wait_info->num_bo_read_handles >
> AMDGPU_USERQ_MAX_HANDLES)
> >>            return -EINVAL;
> >>
> >> -  /* Reject non-NULL pointers paired with a zero count: the pointer
> >> -   * is meaningless and indicates inconsistent input from userspace.
> >> -   */
> >> -  if (!wait_info->num_syncobj_handles && wait_info->syncobj_handles)
> >> -          return -EINVAL;
> >> -
> >> -  if (!wait_info->num_syncobj_timeline_handles &&
> >> -      (wait_info->syncobj_timeline_handles || wait_info-
> >syncobj_timeline_points))
> >> -          return -EINVAL;
> >> -
> >> -  if (!wait_info->num_bo_read_handles && wait_info->bo_read_handles)
> >> -          return -EINVAL;
> >> -
> >> -  if (!wait_info->num_bo_write_handles && wait_info->bo_write_handles)
> >> -          return -EINVAL;
> >> -
> >> -  if (!wait_info->num_fences && wait_info->out_fences)
> >> -          return -EINVAL;
> >> -
> > Mhm, in general such checks look valid to me.
> >
> > My educated guess is that userspace sets num_fences = 0 to query if it 
> > needs to
> resize the pointer out_fences or not.
> >
> > If you have time please double check which check fails here.
>
> Sure, i will check on that but for now i have pushed this revert.
>
> regards
>
> sunil khatri
>
> >
> > Apart from that Reviewed-by: Christian König <[email protected]>.
> >
> > Regards,
> > Christian.
> >
> >>    num_syncobj = wait_info->num_syncobj_handles;
> >>    ptr = u64_to_user_ptr(wait_info->syncobj_handles);
> >>    syncobj_handles = memdup_array_user(ptr, num_syncobj,
> >> sizeof(u32));

RE: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl argument validation"

Reply via email to