On 18-03-2026 01:29 pm, Christian König wrote:

On 3/18/26 08:47, Sunil Khatri wrote:
This reverts commit 0cdff8eb31c139dde4716e4aa37198c16364629e.

The patch has caused regression for userqueues where user is stuck
and is waiting for fences and a gpu reset is triggered in kernel.
Also for any of the parameters when count is zero, the driver does
not read from the pointer and having that check is overkill.

Application:
MESA: error: amdgpu: getting wait num_fences failed
MESA: error: amdgpu: getting wait fences failed
MESA: error: amdgpu: getting wait num_fences failed
MESA: error: amdgpu: getting wait fences failed

Dmesg:
[  122.668493] amdgpu 0000:0a:00.0: sq_intr: error, detail 0x00000000, type 1, 
sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0
[  122.668504] amdgpu 0000:0a:00.0: sq_intr: error, detail 0x00000000, type 1, 
sh 1, priv 0, wave_id 0, simd_id 0, wgp_id 0
[  124.687518] amdgpu 0000:0a:00.0: Dumping IP State
[  124.688351] amdgpu 0000:0a:00.0: Dumping IP State Completed
[  124.688355] amdgpu 0000:0a:00.0: [drm] AMDGPU device coredump file has been 
created
[  124.688357] amdgpu 0000:0a:00.0: [drm] Check your 
/sys/class/drm/card0/device/devcoredump/data
[  124.688361] amdgpu 0000:0a:00.0: ring gfx_0.0.0 timeout, signaled seq=569, 
emitted seq=571
[  124.688366] amdgpu 0000:0a:00.0:  Process Xwayland pid 3471 thread 
Xwayland:cs0 pid 3479
[  124.688369] amdgpu 0000:0a:00.0: Starting gfx_0.0.0 ring reset
[  126.560451] amdgpu 0000:0a:00.0: MES(0) failed to respond to msg=RESET
[  126.560456] amdgpu 0000:0a:00.0: failed to detect and reset
[  126.560460] amdgpu 0000:0a:00.0: Failed to detect and reset queues, err 
(-110)
[  128.789840] amdgpu 0000:0a:00.0: Ring gfx_0.0.0 reset failed
[  128.789848] amdgpu 0000:0a:00.0: GPU reset begin!. Source:  1
[  128.790161] amdgpu 0000:0a:00.0: Guilty job already signaled, skipping HW 
reset
[  128.790174] amdgpu 0000:0a:00.0: GPU reset(1) succeeded!
[  128.804538] amdgpu 0000:0a:00.0: [drm] device wedged, but recovered through 
reset
[  128.804574] amdgpu 0000:0a:00.0: GPU reset begin!. Source:  6
[  128.816663] amdgpu 0000:0a:00.0: Dumping IP State
[  128.817458] amdgpu 0000:0a:00.0: Dumping IP State Completed
[  130.963939] amdgpu 0000:0a:00.0: MES(1) failed to respond to msg=REMOVE_QUEUE
[  130.963949] amdgpu 0000:0a:00.0: failed to unmap legacy queue

Cc: Jesse Zhang <[email protected]>
Signed-off-by: Sunil Khatri <[email protected]>
---
  .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 29 -------------------
  1 file changed, 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 3fcd70a38374..0d9a13081f2f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -484,16 +484,6 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void 
*data,
            args->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES)
                return -EINVAL;
- /* Reject non-NULL pointers paired with a zero count. */
-       if (!args->num_syncobj_handles && args->syncobj_handles)
-               return -EINVAL;
-
-       if (!args->num_bo_read_handles && args->bo_read_handles)
-               return -EINVAL;
-
-       if (!args->num_bo_write_handles && args->bo_write_handles)
-               return -EINVAL;
-
        num_syncobj_handles = args->num_syncobj_handles;
        syncobj_handles = 
memdup_array_user(u64_to_user_ptr(args->syncobj_handles),
                                            num_syncobj_handles, sizeof(u32));
@@ -950,25 +940,6 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void 
*data,
            wait_info->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES)
                return -EINVAL;
- /* Reject non-NULL pointers paired with a zero count: the pointer
-        * is meaningless and indicates inconsistent input from userspace.
-        */
-       if (!wait_info->num_syncobj_handles && wait_info->syncobj_handles)
-               return -EINVAL;
-
-       if (!wait_info->num_syncobj_timeline_handles &&
-           (wait_info->syncobj_timeline_handles || 
wait_info->syncobj_timeline_points))
-               return -EINVAL;
-
-       if (!wait_info->num_bo_read_handles && wait_info->bo_read_handles)
-               return -EINVAL;
-
-       if (!wait_info->num_bo_write_handles && wait_info->bo_write_handles)
-               return -EINVAL;
-
-       if (!wait_info->num_fences && wait_info->out_fences)
-               return -EINVAL;
-
Mhm, in general such checks look valid to me.

My educated guess is that userspace sets num_fences = 0 to query if it needs to 
resize the pointer out_fences or not.

If you have time please double check which check fails here.

Sure, i will check on that but for now i have pushed this revert.

regards

sunil khatri


Apart from that Reviewed-by: Christian König <[email protected]>.

Regards,
Christian.

        num_syncobj = wait_info->num_syncobj_handles;
        ptr = u64_to_user_ptr(wait_info->syncobj_handles);
        syncobj_handles = memdup_array_user(ptr, num_syncobj, sizeof(u32));

Reply via email to