[AMD Official Use Only - AMD Internal Distribution Only] > -----Original Message----- > From: Khatri, Sunil <[email protected]> > Sent: Wednesday, March 18, 2026 4:22 PM > To: Koenig, Christian <[email protected]>; Khatri, Sunil > <[email protected]>; Deucher, Alexander <[email protected]> > Cc: [email protected]; Zhang, Jesse(Jie) <[email protected]> > Subject: Re: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl argument > validation" > > > On 18-03-2026 01:29 pm, Christian König wrote: > > > > On 3/18/26 08:47, Sunil Khatri wrote: > >> This reverts commit 0cdff8eb31c139dde4716e4aa37198c16364629e. > >> > >> The patch has caused regression for userqueues where user is stuck > >> and is waiting for fences and a gpu reset is triggered in kernel. > >> Also for any of the parameters when count is zero, the driver does > >> not read from the pointer and having that check is overkill. > >> > >> Application: > >> MESA: error: amdgpu: getting wait num_fences failed > >> MESA: error: amdgpu: getting wait fences failed > >> MESA: error: amdgpu: getting wait num_fences failed > >> MESA: error: amdgpu: getting wait fences failed After I reverted this patch, the error still occurs when running glxgears. Does it work fine on your end if you don't apply this patch?
amdgpu: getting wait fences failed amdgpu: getting wait fences failed amdgpu: getting wait fences failed Thanks Jesse > >> > >> Dmesg: > >> [ 122.668493] amdgpu 0000:0a:00.0: sq_intr: error, detail > >> 0x00000000, type 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0 [ > >> 122.668504] amdgpu 0000:0a:00.0: sq_intr: error, detail 0x00000000, > >> type 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0 [ 124.687518] > >> amdgpu 0000:0a:00.0: Dumping IP State [ 124.688351] amdgpu > >> 0000:0a:00.0: Dumping IP State Completed [ 124.688355] amdgpu > >> 0000:0a:00.0: [drm] AMDGPU device coredump file has been created [ > >> 124.688357] amdgpu 0000:0a:00.0: [drm] Check your > >> /sys/class/drm/card0/device/devcoredump/data > >> [ 124.688361] amdgpu 0000:0a:00.0: ring gfx_0.0.0 timeout, signaled > >> seq=569, emitted seq=571 [ 124.688366] amdgpu 0000:0a:00.0: Process > >> Xwayland pid 3471 thread Xwayland:cs0 pid 3479 [ 124.688369] amdgpu > >> 0000:0a:00.0: Starting gfx_0.0.0 ring reset [ 126.560451] amdgpu > >> 0000:0a:00.0: MES(0) failed to respond to msg=RESET [ 126.560456] > >> amdgpu 0000:0a:00.0: failed to detect and reset [ 126.560460] amdgpu > >> 0000:0a:00.0: Failed to detect and reset queues, err (-110) [ > >> 128.789840] amdgpu 0000:0a:00.0: Ring gfx_0.0.0 reset failed [ > >> 128.789848] amdgpu 0000:0a:00.0: GPU reset begin!. Source: 1 [ > >> 128.790161] amdgpu 0000:0a:00.0: Guilty job already signaled, skipping HW > reset [ 128.790174] amdgpu 0000:0a:00.0: GPU reset(1) succeeded! > >> [ 128.804538] amdgpu 0000:0a:00.0: [drm] device wedged, but > >> recovered through reset [ 128.804574] amdgpu 0000:0a:00.0: GPU reset > >> begin!. Source: 6 [ 128.816663] amdgpu 0000:0a:00.0: Dumping IP > >> State [ 128.817458] amdgpu 0000:0a:00.0: Dumping IP State Completed > >> [ 130.963939] amdgpu 0000:0a:00.0: MES(1) failed to respond to > >> msg=REMOVE_QUEUE [ 130.963949] amdgpu 0000:0a:00.0: failed to unmap > >> legacy queue > >> > >> Cc: Jesse Zhang <[email protected]> > >> Signed-off-by: Sunil Khatri <[email protected]> > >> --- > >> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 29 ------------------- > >> 1 file changed, 29 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c > >> index 3fcd70a38374..0d9a13081f2f 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c > >> @@ -484,16 +484,6 @@ int amdgpu_userq_signal_ioctl(struct drm_device > *dev, void *data, > >> args->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES) > >> return -EINVAL; > >> > >> - /* Reject non-NULL pointers paired with a zero count. */ > >> - if (!args->num_syncobj_handles && args->syncobj_handles) > >> - return -EINVAL; > >> - > >> - if (!args->num_bo_read_handles && args->bo_read_handles) > >> - return -EINVAL; > >> - > >> - if (!args->num_bo_write_handles && args->bo_write_handles) > >> - return -EINVAL; > >> - > >> num_syncobj_handles = args->num_syncobj_handles; > >> syncobj_handles = memdup_array_user(u64_to_user_ptr(args- > >syncobj_handles), > >> num_syncobj_handles, sizeof(u32)); > >> @@ - > 950,25 +940,6 @@ > >> int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, > >> wait_info->num_bo_read_handles > > AMDGPU_USERQ_MAX_HANDLES) > >> return -EINVAL; > >> > >> - /* Reject non-NULL pointers paired with a zero count: the pointer > >> - * is meaningless and indicates inconsistent input from userspace. > >> - */ > >> - if (!wait_info->num_syncobj_handles && wait_info->syncobj_handles) > >> - return -EINVAL; > >> - > >> - if (!wait_info->num_syncobj_timeline_handles && > >> - (wait_info->syncobj_timeline_handles || wait_info- > >syncobj_timeline_points)) > >> - return -EINVAL; > >> - > >> - if (!wait_info->num_bo_read_handles && wait_info->bo_read_handles) > >> - return -EINVAL; > >> - > >> - if (!wait_info->num_bo_write_handles && wait_info->bo_write_handles) > >> - return -EINVAL; > >> - > >> - if (!wait_info->num_fences && wait_info->out_fences) > >> - return -EINVAL; > >> - > > Mhm, in general such checks look valid to me. > > > > My educated guess is that userspace sets num_fences = 0 to query if it > > needs to > resize the pointer out_fences or not. > > > > If you have time please double check which check fails here. > > Sure, i will check on that but for now i have pushed this revert. > > regards > > sunil khatri > > > > > Apart from that Reviewed-by: Christian König <[email protected]>. > > > > Regards, > > Christian. > > > >> num_syncobj = wait_info->num_syncobj_handles; > >> ptr = u64_to_user_ptr(wait_info->syncobj_handles); > >> syncobj_handles = memdup_array_user(ptr, num_syncobj, > >> sizeof(u32));
