[Public]

Hi Prike,

Regarding below check in the Kernel patch,

/* Reject non-NULL pointers paired with a zero count. */
if (!args->num_syncobj_handles && args->syncobj_handles)
          return -EINVAL;

Mesa uses alloca for args->syncobj_handles, alloca(0) returns non NULL.


I think the check "Reject non-NULL pointers paired with a zero count" in Kernel 
can be skipped.


Thank you,
Yogesh

________________________________
From: Liang, Prike <[email protected]>
Sent: Thursday, March 19, 2026 8:03 AM
To: Khatri, Sunil <[email protected]>; Zhang, Jesse(Jie) 
<[email protected]>; Khatri, Sunil <[email protected]>; Koenig, Christian 
<[email protected]>; Deucher, Alexander <[email protected]>; 
Mohan Marimuthu, Yogesh <[email protected]>; Olsak, Marek 
<[email protected]>
Cc: [email protected] <[email protected]>
Subject: RE: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl argument 
validation"

[Public]

Add @Mohan Marimuthu, Yogesh/@Olsak, Marek

It looks like the validation in several places doesn’t match how Mesa allocates 
these buffers. i.e when num_syncobj_handles is zero, syncobj_handles must not 
be required to be NULL, because Mesa leaves it uninitialized when allocating it 
on the stack. We should either implement a proper fix in Mesa for this case or 
drop the known broken validation check.

Regards,
      Prike

> -----Original Message-----
> From: amd-gfx <[email protected]> On Behalf Of Khatri, 
> Sunil
> Sent: Wednesday, March 18, 2026 5:39 PM
> To: Zhang, Jesse(Jie) <[email protected]>; Khatri, Sunil
> <[email protected]>; Koenig, Christian <[email protected]>;
> Deucher, Alexander <[email protected]>
> Cc: [email protected]
> Subject: Re: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl argument
> validation"
>
>
> On 18-03-2026 03:02 pm, Zhang, Jesse(Jie) wrote:
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> >> -----Original Message-----
> >> From: Khatri, Sunil <[email protected]>
> >> Sent: Wednesday, March 18, 2026 4:22 PM
> >> To: Koenig, Christian <[email protected]>; Khatri, Sunil
> >> <[email protected]>; Deucher, Alexander
> >> <[email protected]>
> >> Cc: [email protected]; Zhang, Jesse(Jie)
> >> <[email protected]>
> >> Subject: Re: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl
> >> argument validation"
> >>
> >>
> >> On 18-03-2026 01:29 pm, Christian König wrote:
> >>> On 3/18/26 08:47, Sunil Khatri wrote:
> >>>> This reverts commit 0cdff8eb31c139dde4716e4aa37198c16364629e.
> >>>>
> >>>> The patch has caused regression for userqueues where user is stuck
> >>>> and is waiting for fences and a gpu reset is triggered in kernel.
> >>>> Also for any of the parameters when count is zero, the driver does
> >>>> not read from the pointer and having that check is overkill.
> >>>>
> >>>> Application:
> >>>> MESA: error: amdgpu: getting wait num_fences failed
> >>>> MESA: error: amdgpu: getting wait fences failed
> >>>> MESA: error: amdgpu: getting wait num_fences failed
> >>>> MESA: error: amdgpu: getting wait fences failed
> > After I reverted this patch, the error still occurs when running glxgears.
> > Does it work fine on your end if you don't apply this patch?
> >
> > amdgpu: getting wait fences failed
> > amdgpu: getting wait fences failed
> > amdgpu: getting wait fences failed
>
> Yes, it works. You might need to update mesa too. I am using the latest mesa 
> with
> ubuntu and i dont see those error. with your patch they do show.
>
> Regards
> Sunil Khatri
> >
> > Thanks
> > Jesse
> >
> >
> >>>> Dmesg:
> >>>> [  122.668493] amdgpu 0000:0a:00.0: sq_intr: error, detail
> >>>> 0x00000000, type 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0 [
> >>>> 122.668504] amdgpu 0000:0a:00.0: sq_intr: error, detail 0x00000000,
> >>>> type 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0 [  124.687518]
> >>>> amdgpu 0000:0a:00.0: Dumping IP State [  124.688351] amdgpu
> >>>> 0000:0a:00.0: Dumping IP State Completed [  124.688355] amdgpu
> >>>> 0000:0a:00.0: [drm] AMDGPU device coredump file has been created [
> >>>> 124.688357] amdgpu 0000:0a:00.0: [drm] Check your
> >>>> /sys/class/drm/card0/device/devcoredump/data
> >>>> [  124.688361] amdgpu 0000:0a:00.0: ring gfx_0.0.0 timeout,
> >>>> signaled seq=569, emitted seq=571 [  124.688366] amdgpu
> >>>> 0000:0a:00.0:  Process Xwayland pid 3471 thread Xwayland:cs0 pid
> >>>> 3479 [  124.688369] amdgpu
> >>>> 0000:0a:00.0: Starting gfx_0.0.0 ring reset [  126.560451] amdgpu
> >>>> 0000:0a:00.0: MES(0) failed to respond to msg=RESET [  126.560456]
> >>>> amdgpu 0000:0a:00.0: failed to detect and reset [  126.560460]
> >>>> amdgpu
> >>>> 0000:0a:00.0: Failed to detect and reset queues, err (-110) [
> >>>> 128.789840] amdgpu 0000:0a:00.0: Ring gfx_0.0.0 reset failed [
> >>>> 128.789848] amdgpu 0000:0a:00.0: GPU reset begin!. Source:  1 [
> >>>> 128.790161] amdgpu 0000:0a:00.0: Guilty job already signaled,
> >>>> skipping HW
> >> reset [  128.790174] amdgpu 0000:0a:00.0: GPU reset(1) succeeded!
> >>>> [  128.804538] amdgpu 0000:0a:00.0: [drm] device wedged, but
> >>>> recovered through reset [  128.804574] amdgpu 0000:0a:00.0: GPU
> >>>> reset begin!. Source:  6 [  128.816663] amdgpu 0000:0a:00.0:
> >>>> Dumping IP State [  128.817458] amdgpu 0000:0a:00.0: Dumping IP
> >>>> State Completed [  130.963939] amdgpu 0000:0a:00.0: MES(1) failed
> >>>> to respond to msg=REMOVE_QUEUE [  130.963949] amdgpu 0000:0a:00.0:
> >>>> failed to unmap legacy queue
> >>>>
> >>>> Cc: Jesse Zhang <[email protected]>
> >>>> Signed-off-by: Sunil Khatri <[email protected]>
> >>>> ---
> >>>>    .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 29 -------------------
> >>>>    1 file changed, 29 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >>>> index 3fcd70a38374..0d9a13081f2f 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >>>> @@ -484,16 +484,6 @@ int amdgpu_userq_signal_ioctl(struct
> >>>> drm_device
> >> *dev, void *data,
> >>>>         args->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES)
> >>>>             return -EINVAL;
> >>>>
> >>>> -  /* Reject non-NULL pointers paired with a zero count. */
> >>>> -  if (!args->num_syncobj_handles && args->syncobj_handles)
> >>>> -          return -EINVAL;
> >>>> -
> >>>> -  if (!args->num_bo_read_handles && args->bo_read_handles)
> >>>> -          return -EINVAL;
> >>>> -
> >>>> -  if (!args->num_bo_write_handles && args->bo_write_handles)
> >>>> -          return -EINVAL;
> >>>> -
> >>>>     num_syncobj_handles = args->num_syncobj_handles;
> >>>>     syncobj_handles = memdup_array_user(u64_to_user_ptr(args-
> >>> syncobj_handles),
> >>>>                                         num_syncobj_handles,
> >>>> sizeof(u32)); @@ -
> >> 950,25 +940,6 @@
> >>>> int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> >>>>         wait_info->num_bo_read_handles >
> >> AMDGPU_USERQ_MAX_HANDLES)
> >>>>             return -EINVAL;
> >>>>
> >>>> -  /* Reject non-NULL pointers paired with a zero count: the pointer
> >>>> -   * is meaningless and indicates inconsistent input from userspace.
> >>>> -   */
> >>>> -  if (!wait_info->num_syncobj_handles && wait_info->syncobj_handles)
> >>>> -          return -EINVAL;
> >>>> -
> >>>> -  if (!wait_info->num_syncobj_timeline_handles &&
> >>>> -      (wait_info->syncobj_timeline_handles || wait_info-
> >>> syncobj_timeline_points))
> >>>> -          return -EINVAL;
> >>>> -
> >>>> -  if (!wait_info->num_bo_read_handles && wait_info->bo_read_handles)
> >>>> -          return -EINVAL;
> >>>> -
> >>>> -  if (!wait_info->num_bo_write_handles && wait_info->bo_write_handles)
> >>>> -          return -EINVAL;
> >>>> -
> >>>> -  if (!wait_info->num_fences && wait_info->out_fences)
> >>>> -          return -EINVAL;
> >>>> -
> >>> Mhm, in general such checks look valid to me.
> >>>
> >>> My educated guess is that userspace sets num_fences = 0 to query if
> >>> it needs to
> >> resize the pointer out_fences or not.
> >>> If you have time please double check which check fails here.
> >> Sure, i will check on that but for now i have pushed this revert.
> >>
> >> regards
> >>
> >> sunil khatri
> >>
> >>> Apart from that Reviewed-by: Christian König <[email protected]>.
> >>>
> >>> Regards,
> >>> Christian.
> >>>
> >>>>     num_syncobj = wait_info->num_syncobj_handles;
> >>>>     ptr = u64_to_user_ptr(wait_info->syncobj_handles);
> >>>>     syncobj_handles = memdup_array_user(ptr, num_syncobj,
> >>>> sizeof(u32));

Reply via email to