A general protection fault occurs when signaling DMA fences from
the user queue fence driver due to an invalid callback function
pointer. This indicates a use-after-free
where fence objects are accessed after being freed.

The issue occurs because:
1. Fences may be signaled multiple times if they remain in the
   fence list after signaling
2. Fence objects may be freed while still referenced in the list
3. The fence list isn't properly validated before processing

Add necessary safeguards:
- Check if fence is already signaled before attempting to signal
- Validate fence ops structure before accessing callback pointers
- Use list_for_each_entry_safe with proper reference counting
- Add WARN_ON for debugging corrupted fence states

This prevents the GPF by ensuring we only process valid, unsignaled
fences and properly handle already-signaled or corrupted entries.

0xdeadbeafdeadbeaf: 0000 [#1] SMP NOPTI
[  353.889511] CPU: 22 UID: 0 PID: 0 Comm: swapper/22 Tainted: G            E   
    6.16.0+ #15 PREEMPT(voluntary)
[  353.889531] Tainted: [E]=UNSIGNED_MODULE
[  353.889539] Hardware name: AMD Splinter/Splinter-GNR, BIOS WS54117N_140 
01/16/2024
[  353.889552] RIP: 0010:dma_fence_signal_timestamp_locked+0x7c/0x110
[  353.889570] Code: 10 f0 80 4f 30 04 66 90 48 8b 75 d0 48 8b 1e 48 89 f0 4c 
39 ee 75 05 eb 24 48 89 d3 48 89 06 4c 89 e7 48 89 46 08 48 8b 46 10 <ff> d0 0f 
1f 00 48 8b 13 48 89 d8 48 89 de 4c 39 eb 75 dc 31 c0 48
[  353.889593] RSP: 0018:ffffc0840078cd30 EFLAGS: 00010087
[  353.889606] RAX: deadbeafdeadbeaf RBX: ffffc0840078cd30 RCX: 0000000000000018
[  353.889617] RDX: 00000000000216c8 RSI: ffff9ed014558160 RDI: ffff9ed00bab3680
[  353.889628] RBP: ffffc0840078cd60 R08: 0000000000000000 R09: 0000000000000000
[  353.889639] R10: 0000000000001808 R11: 0000000000000001 R12: ffff9ed00bab3680
[  353.889650] R13: ffffc0840078cd30 R14: ffff9ed00bab3680 R15: 0000000000000014
[  353.889661] FS:  0000000000000000(0000) GS:ffff9ed198865000(0000) 
knlGS:0000000000000000
[  353.889674] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  353.889684] CR2: 00007f44ebebf000 CR3: 0000000108240000 CR4: 0000000000750ef0
[  353.889696] PKRU: 55555554
[  353.889703] Call Trace:
[  353.889711]  <IRQ>
[  353.889722]  dma_fence_signal+0x35/0x70
[  353.889738]  amdgpu_userq_fence_driver_process.part.0+0x67/0x150 [amdgpu]
[  353.889995]  amdgpu_userq_fence_driver_process+0x17/0x30 [amdgpu]
[  353.890204]  gfx_v11_0_eop_irq+0x137/0x180 [amdgpu]
[  353.890345]  amdgpu_irq_dispatch+0x1b2/0x2f0 [amdgpu]
[  353.890452]  ? sched_clock+0x14/0x30
[  353.890462]  amdgpu_ih_process+0x8d/0x1f0 [amdgpu]
[  353.890566]  amdgpu_irq_handler+0x28/0x60 [amdgpu]
[  353.890667]  __handle_irq_event_percpu+0x50/0x1b0
[  353.890677]  handle_irq_event_percpu+0x19/0x60
[  353.890683]  handle_irq_event+0x3d/0x60
[  353.890689]  handle_edge_irq+0xa0/0x180
[  353.890696]  __common_interrupt+0x52/0x100
[  353.890703]  common_interrupt+0x9b/0xc0
[  353.890711]  </IRQ>
[  353.890714]  <TASK>

Signed-off-by: Jesse Zhang <[email protected]>
---
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 28 ++++++++++++++-----
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 95e91d1dc58a..e18656d0bee3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -163,16 +163,30 @@ void amdgpu_userq_fence_driver_process(struct 
amdgpu_userq_fence_driver *fence_d
        list_for_each_entry_safe(userq_fence, tmp, &fence_drv->fences, link) {
                fence = &userq_fence->base;
 
-               if (rptr < fence->seqno)
-                       break;
+               /* Add sanity check - ensure fence is still valid */
+               if (!dma_fence_is_signaled(fence)) {
+                       if (rptr < fence->seqno)
+                               break;
+
+                       /* Verify the callback function pointer looks 
reasonable */
+                       if (WARN_ON(!fence->ops || !fence->ops->signaled)) {
+                               /* Remove corrupted fence from list */
+                               list_del(&userq_fence->link);
+                               continue;
+                       }
 
-               dma_fence_signal(fence);
+                       dma_fence_signal(fence);
 
-               for (i = 0; i < userq_fence->fence_drv_array_count; i++)
-                       
amdgpu_userq_fence_driver_put(userq_fence->fence_drv_array[i]);
+                       for (i = 0; i < userq_fence->fence_drv_array_count; i++)
+                               
amdgpu_userq_fence_driver_put(userq_fence->fence_drv_array[i]);
 
-               list_del(&userq_fence->link);
-               dma_fence_put(fence);
+                       list_del(&userq_fence->link);
+                       dma_fence_put(fence);
+               } else {
+                       /* Fence was already signaled, remove from list */
+                       list_del(&userq_fence->link);
+                       dma_fence_put(fence);
+               }
        }
        spin_unlock(&fence_drv->fence_list_lock);
 }
-- 
2.49.0

Reply via email to