[AMD Official Use Only - AMD Internal Distribution Only] Hi,
alloca(0) will point to current top of stack and hence it can be said the field is initialized. Thank you, Yogesh ________________________________ From: Koenig, Christian <[email protected]> Sent: Thursday, March 19, 2026 1:00 PM To: Liang, Prike <[email protected]>; Mohan Marimuthu, Yogesh <[email protected]>; Khatri, Sunil <[email protected]>; Zhang, Jesse(Jie) <[email protected]>; Deucher, Alexander <[email protected]>; Olsak, Marek <[email protected]> Cc: [email protected] <[email protected]> Subject: Re: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl argument validation" Hi guys, well when mesa leaves some fields in the structure uninitialized then that is a pretty bad idea and we should eventually fix that. But always setting the pointers to valid arrays and just setting the number of array elements to zero is perfectly valid. That doesn't even needs a debug message. Regards, Christian. On 3/19/26 07:27, Liang, Prike wrote: > [Public] > > > Thanks for the confirmation. If Mesa doesn’t zero the handle buffer, I’m > going to drop this validation check in the kernel and then leave a debug > message for this case. > > > > Regards, > > Prike > > > > *From:*Mohan Marimuthu, Yogesh <[email protected]> > *Sent:* Thursday, March 19, 2026 1:31 PM > *To:* Liang, Prike <[email protected]>; Khatri, Sunil > <[email protected]>; Zhang, Jesse(Jie) <[email protected]>; Khatri, > Sunil <[email protected]>; Koenig, Christian <[email protected]>; > Deucher, Alexander <[email protected]>; Olsak, Marek > <[email protected]> > *Cc:* [email protected] > *Subject:* Re: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl argument > validation" > > > > [Public] > > > > Hi Prike, > > > > Regarding below check in the Kernel patch, > > > > /* Reject non-NULL pointers paired with a zero count. */ > > if (!args->num_syncobj_handles && args->syncobj_handles) > > return -EINVAL; > > > > Mesa uses alloca for args->syncobj_handles, alloca(0) returns non NULL. > > > > > > I think the check "Reject non-NULL pointers paired with a zero count" in > Kernel can be skipped. > > > > > > Thank you, > > Yogesh > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > *From:* Liang, Prike <[email protected] <mailto:[email protected]>> > *Sent:* Thursday, March 19, 2026 8:03 AM > *To:* Khatri, Sunil <[email protected] <mailto:[email protected]>>; > Zhang, Jesse(Jie) <[email protected] <mailto:[email protected]>>; Khatri, > Sunil <[email protected] <mailto:[email protected]>>; Koenig, Christian > <[email protected] <mailto:[email protected]>>; Deucher, > Alexander <[email protected] <mailto:[email protected]>>; > Mohan Marimuthu, Yogesh <[email protected] > <mailto:[email protected]>>; Olsak, Marek <[email protected] > <mailto:[email protected]>> > *Cc:* [email protected] <mailto:[email protected]> > <[email protected] <mailto:[email protected]>> > *Subject:* RE: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl argument > validation" > > > > [Public] > > Add @Mohan Marimuthu, Yogesh/@Olsak, Marek > > It looks like the validation in several places doesn’t match how Mesa > allocates these buffers. i.e when num_syncobj_handles is zero, > syncobj_handles must not be required to be NULL, because Mesa leaves it > uninitialized when allocating it on the stack. We should either implement a > proper fix in Mesa for this case or drop the known broken validation check. > > Regards, > Prike > >> -----Original Message----- >> From: amd-gfx <[email protected] >> <mailto:[email protected]>> On Behalf Of Khatri, Sunil >> Sent: Wednesday, March 18, 2026 5:39 PM >> To: Zhang, Jesse(Jie) <[email protected] <mailto:[email protected]>>; >> Khatri, Sunil >> <[email protected] <mailto:[email protected]>>; Koenig, Christian >> <[email protected] <mailto:[email protected]>>; >> Deucher, Alexander <[email protected] >> <mailto:[email protected]>> >> Cc: [email protected] <mailto:[email protected]> >> Subject: Re: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl argument >> validation" >> >> >> On 18-03-2026 03:02 pm, Zhang, Jesse(Jie) wrote: >> > [AMD Official Use Only - AMD Internal Distribution Only] >> > >> >> -----Original Message----- >> >> From: Khatri, Sunil <[email protected] <mailto:[email protected]>> >> >> Sent: Wednesday, March 18, 2026 4:22 PM >> >> To: Koenig, Christian <[email protected] >> >> <mailto:[email protected]>>; Khatri, Sunil >> >> <[email protected] <mailto:[email protected]>>; Deucher, Alexander >> >> <[email protected] <mailto:[email protected]>> >> >> Cc: [email protected] <mailto:[email protected]>; >> >> Zhang, Jesse(Jie) >> >> <[email protected] <mailto:[email protected]>> >> >> Subject: Re: [PATCH] Revert "drm/amdgpu: harden SIGNAL/WAIT ioctl >> >> argument validation" >> >> >> >> >> >> On 18-03-2026 01:29 pm, Christian König wrote: >> >>> On 3/18/26 08:47, Sunil Khatri wrote: >> >>>> This reverts commit 0cdff8eb31c139dde4716e4aa37198c16364629e. >> >>>> >> >>>> The patch has caused regression for userqueues where user is stuck >> >>>> and is waiting for fences and a gpu reset is triggered in kernel. >> >>>> Also for any of the parameters when count is zero, the driver does >> >>>> not read from the pointer and having that check is overkill. >> >>>> >> >>>> Application: >> >>>> MESA: error: amdgpu: getting wait num_fences failed >> >>>> MESA: error: amdgpu: getting wait fences failed >> >>>> MESA: error: amdgpu: getting wait num_fences failed >> >>>> MESA: error: amdgpu: getting wait fences failed >> > After I reverted this patch, the error still occurs when running glxgears. >> > Does it work fine on your end if you don't apply this patch? >> > >> > amdgpu: getting wait fences failed >> > amdgpu: getting wait fences failed >> > amdgpu: getting wait fences failed >> >> Yes, it works. You might need to update mesa too. I am using the latest mesa >> with >> ubuntu and i dont see those error. with your patch they do show. >> >> Regards >> Sunil Khatri >> > >> > Thanks >> > Jesse >> > >> > >> >>>> Dmesg: >> >>>> [ 122.668493] amdgpu 0000:0a:00.0: sq_intr: error, detail >> >>>> 0x00000000, type 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0 [ >> >>>> 122.668504] amdgpu 0000:0a:00.0: sq_intr: error, detail 0x00000000, >> >>>> type 1, sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0 [ 124.687518] >> >>>> amdgpu 0000:0a:00.0: Dumping IP State [ 124.688351] amdgpu >> >>>> 0000:0a:00.0: Dumping IP State Completed [ 124.688355] amdgpu >> >>>> 0000:0a:00.0: [drm] AMDGPU device coredump file has been created [ >> >>>> 124.688357] amdgpu 0000:0a:00.0: [drm] Check your >> >>>> /sys/class/drm/card0/device/devcoredump/data >> >>>> [ 124.688361] amdgpu 0000:0a:00.0: ring gfx_0.0.0 timeout, >> >>>> signaled seq=569, emitted seq=571 [ 124.688366] amdgpu >> >>>> 0000:0a:00.0: Process Xwayland pid 3471 thread Xwayland:cs0 pid >> >>>> 3479 [ 124.688369] amdgpu >> >>>> 0000:0a:00.0: Starting gfx_0.0.0 ring reset [ 126.560451] amdgpu >> >>>> 0000:0a:00.0: MES(0) failed to respond to msg=RESET [ 126.560456] >> >>>> amdgpu 0000:0a:00.0: failed to detect and reset [ 126.560460] >> >>>> amdgpu >> >>>> 0000:0a:00.0: Failed to detect and reset queues, err (-110) [ >> >>>> 128.789840] amdgpu 0000:0a:00.0: Ring gfx_0.0.0 reset failed [ >> >>>> 128.789848] amdgpu 0000:0a:00.0: GPU reset begin!. Source: 1 [ >> >>>> 128.790161] amdgpu 0000:0a:00.0: Guilty job already signaled, >> >>>> skipping HW >> >> reset [ 128.790174] amdgpu 0000:0a:00.0: GPU reset(1) succeeded! >> >>>> [ 128.804538] amdgpu 0000:0a:00.0: [drm] device wedged, but >> >>>> recovered through reset [ 128.804574] amdgpu 0000:0a:00.0: GPU >> >>>> reset begin!. Source: 6 [ 128.816663] amdgpu 0000:0a:00.0: >> >>>> Dumping IP State [ 128.817458] amdgpu 0000:0a:00.0: Dumping IP >> >>>> State Completed [ 130.963939] amdgpu 0000:0a:00.0: MES(1) failed >> >>>> to respond to msg=REMOVE_QUEUE [ 130.963949] amdgpu 0000:0a:00.0: >> >>>> failed to unmap legacy queue >> >>>> >> >>>> Cc: Jesse Zhang <[email protected] <mailto:[email protected]>> >> >>>> Signed-off-by: Sunil Khatri <[email protected] >> >>>> <mailto:[email protected]>> >> >>>> --- >> >>>> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 29 >> >>>> ------------------- >> >>>> 1 file changed, 29 deletions(-) >> >>>> >> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >> >>>> index 3fcd70a38374..0d9a13081f2f 100644 >> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c >> >>>> @@ -484,16 +484,6 @@ int amdgpu_userq_signal_ioctl(struct >> >>>> drm_device >> >> *dev, void *data, >> >>>> args->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES) >> >>>> return -EINVAL; >> >>>> >> >>>> - /* Reject non-NULL pointers paired with a zero count. */ >> >>>> - if (!args->num_syncobj_handles && args->syncobj_handles) >> >>>> - return -EINVAL; >> >>>> - >> >>>> - if (!args->num_bo_read_handles && args->bo_read_handles) >> >>>> - return -EINVAL; >> >>>> - >> >>>> - if (!args->num_bo_write_handles && args->bo_write_handles) >> >>>> - return -EINVAL; >> >>>> - >> >>>> num_syncobj_handles = args->num_syncobj_handles; >> >>>> syncobj_handles = memdup_array_user(u64_to_user_ptr(args- >> >>> syncobj_handles), >> >>>> num_syncobj_handles, >> >>>> sizeof(u32)); @@ - >> >> 950,25 +940,6 @@ >> >>>> int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, >> >>>> wait_info->num_bo_read_handles > >> >> AMDGPU_USERQ_MAX_HANDLES) >> >>>> return -EINVAL; >> >>>> >> >>>> - /* Reject non-NULL pointers paired with a zero count: the pointer >> >>>> - * is meaningless and indicates inconsistent input from userspace. >> >>>> - */ >> >>>> - if (!wait_info->num_syncobj_handles && wait_info->syncobj_handles) >> >>>> - return -EINVAL; >> >>>> - >> >>>> - if (!wait_info->num_syncobj_timeline_handles && >> >>>> - (wait_info->syncobj_timeline_handles || wait_info- >> >>> syncobj_timeline_points)) >> >>>> - return -EINVAL; >> >>>> - >> >>>> - if (!wait_info->num_bo_read_handles && wait_info->bo_read_handles) >> >>>> - return -EINVAL; >> >>>> - >> >>>> - if (!wait_info->num_bo_write_handles && wait_info->bo_write_handles) >> >>>> - return -EINVAL; >> >>>> - >> >>>> - if (!wait_info->num_fences && wait_info->out_fences) >> >>>> - return -EINVAL; >> >>>> - >> >>> Mhm, in general such checks look valid to me. >> >>> >> >>> My educated guess is that userspace sets num_fences = 0 to query if >> >>> it needs to >> >> resize the pointer out_fences or not. >> >>> If you have time please double check which check fails here. >> >> Sure, i will check on that but for now i have pushed this revert. >> >> >> >> regards >> >> >> >> sunil khatri >> >> >> >>> Apart from that Reviewed-by: Christian König <[email protected] >> >>> <mailto:[email protected]>>. >> >>> >> >>> Regards, >> >>> Christian. >> >>> >> >>>> num_syncobj = wait_info->num_syncobj_handles; >> >>>> ptr = u64_to_user_ptr(wait_info->syncobj_handles); >> >>>> syncobj_handles = memdup_array_user(ptr, num_syncobj, >> >>>> sizeof(u32)); >
