On 3/24/26 17:43, SHANMUGAM, SRINIVASAN wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > >> -----Original Message----- >> From: Koenig, Christian <[email protected]> >> Sent: Tuesday, March 24, 2026 9:34 PM >> To: SHANMUGAM, SRINIVASAN <[email protected]>; >> Deucher, Alexander <[email protected]> >> Cc: [email protected] >> Subject: Re: [PATCH v2] drm/amdgpu: Fix NULL bo_va dereference in VA clear >> path >> >> On 3/24/26 16:35, SHANMUGAM, SRINIVASAN wrote: >>> [Public] >>> >>> Hi Christian, >>> >>>> -----Original Message----- >>>> From: Koenig, Christian <[email protected]> >>>> Sent: Tuesday, March 24, 2026 7:43 PM >>>> To: SHANMUGAM, SRINIVASAN <[email protected]>; >> Deucher, >>>> Alexander <[email protected]> >>>> Cc: [email protected] >>>> Subject: Re: [PATCH v2] drm/amdgpu: Fix NULL bo_va dereference in VA >>>> clear path >>>> >>>> On 3/24/26 14:57, Srinivasan Shanmugam wrote: >>>>> amdgpu_gem_va_ioctl() can call amdgpu_gem_va_update_vm() with bo_va >>>>> == NULL for AMDGPU_VA_OP_CLEAR. >>>>> >>>>> CLEAR operates on a VM address range and is not associated with a >>>>> specific BO. In this case, the update helper should perform only >>>>> VM-level updates and must not access BO-specific fields. >>>>> >>>>> Currently, bo_va may be dereferenced in the MAP/REPLACE handling >>>>> paths without explicitly guarding against NULL, which can lead to a >>>>> NULL pointer dereference when CLEAR is processed. >>>>> >>>>> Fix this by making amdgpu_gem_va_update_vm() explicitly handle bo_va >>>>> == NULL: >>>>> - Guard BO-specific accesses with bo_va checks >>>>> - Warn if MAP/REPLACE ever reaches the helper with NULL bo_va >>>>> - Keep VM update path unchanged for CLEAR >>>>> >>>>> This keeps CLEAR on the common update path while ensuring safe >>>>> handling of NULL bo_va. >>>>> >>>>> Crash signature: >>>>> [ 325.716062] [IGT] amd_bo: executing [ 325.779102] >>>>> >>>> >> ============================================================ >>>> ====== >>>>> [ 325.786483] BUG: KASAN: null-ptr-deref in >>>>> amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu] [ 325.795105] Write of >>>>> size >>>>> 4 at addr 0000000000000000 by task amd_bo/7893 [ 325.801997] [ >>>>> 325.803595] CPU: 12 UID: 0 PID: 7893 Comm: amd_bo Not tainted >>>>> 6.19.0-1314135.2.zuul.928a0cbbebc74c4f8d5a99a4d0a7ca55 #1 >>>>> PREEMPT(voluntary) [ 325.803602] Hardware name: TYAN B8021G88V2HR- >>>> 2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019 [ 325.803606] Call Trace: >>>>> [ 325.803609] <TASK> >>>>> [ 325.803612] dump_stack_lvl+0x64/0x80 [ 325.803623] >>>>> kasan_report+0xb8/0xf0 [ 325.803631] ? >>>>> amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu] [ 325.804427] >>>>> kasan_check_range+0x105/0x1b0 [ 325.804432] >>>>> amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu] [ 325.805229] ? >>>>> __pfx_amdgpu_gem_create_ioctl+0x10/0x10 [amdgpu] [ 325.806022] ? >>>>> __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu] [ 325.806815] ? >>>>> __pfx___drm_dev_dbg+0x10/0x10 [drm] [ 325.806894] ? >>>>> __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu] [ 325.807686] >>>>> drm_ioctl_kernel+0x13d/0x2b0 [drm] [ 325.807767] ? >>>>> __pfx_file_has_perm+0x10/0x10 [ 325.807777] ? >>>>> __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [ 325.807857] >>>>> drm_ioctl+0x4be/0xae0 [drm] [ 325.807936] ? >>>>> __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu] [ 325.808728] ? >>>>> __pfx_sock_write_iter+0x10/0x10 [ 325.808737] ? >>>>> __pfx_drm_ioctl+0x10/0x10 [drm] [ 325.808816] ? >>>>> ioctl_has_perm.constprop.0.isra.0+0x2ad/0x490 >>>>> [ 325.808823] ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10 >>>>> [ 325.808827] ? _raw_spin_lock_irqsave+0x86/0xd0 [ 325.808835] ? >>>>> __pfx__raw_spin_lock_irqsave+0x10/0x10 >>>>> [ 325.808841] amdgpu_drm_ioctl+0xce/0x180 [amdgpu] [ 325.809622] >>>>> __x64_sys_ioctl+0x139/0x1c0 [ 325.809630] do_syscall_64+0x64/0x880 >>>>> [ 325.809638] entry_SYSCALL_64_after_hwframe+0x76/0x7e >>>>> [ 325.809645] RIP: 0033:0x7f205fd12e1d [ 325.809650] Code: 04 25 >>>>> 28 >>>>> 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 >>>>> b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff >>>>> ff >>>>> 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 [ 325.809654] RSP: >>>>> 002b:00007ffe9032b510 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >> [ >>>>> 325.809660] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: >>>>> 00007f205fd12e1d [ 325.809663] RDX: 00007ffe9032b5b0 RSI: >>>>> 00000000c0406448 RDI: 0000000000000006 [ 325.809665] RBP: >>>>> 00007ffe9032b560 R08: 0000000100000000 R09: 000000000000000e [ >>>>> 325.809668] R10: 0000000000000000 R11: 0000000000000246 R12: >>>>> 00000000c0406448 [ 325.809670] R13: 0000000000000006 R14: >>>>> 0000000000001000 R15: 0000000000000001 [ 325.809675] </TASK> [ >>>>> 325.809678] >>>>> >>>> >> ============================================================ >>>> ====== >>>>> >>>>> Fixes: dc54d3d1744d ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR >> v2") >>>>> Cc: Christian König <[email protected]> >>>>> Cc: Alex Deucher <[email protected]> >>>>> Signed-off-by: Srinivasan Shanmugam <[email protected]> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 11 +++++++++-- >>>>> 1 file changed, 9 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>>> index b0ba2bdaf43a..145cb222d5cf 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>>> @@ -759,9 +759,15 @@ amdgpu_gem_va_update_vm(struct amdgpu_device >>>> *adev, >>>>> if (r) >>>>> goto error; >>>>> >>>>> - /* For MAP/REPLACE we also need to update the BO mappings. */ >>>>> + /* For MAP/REPLACE we also need to update the BO mappings. >>>>> + * CLEAR operates on the VM address range only and can come in with >>>>> + * bo_va == NULL. >>>>> + */ >>>>> if (operation == AMDGPU_VA_OP_MAP || >>>>> operation == AMDGPU_VA_OP_REPLACE) { >>>>> + if (WARN_ON_ONCE(!bo_va)) >>>>> + goto error; >>>>> + >>>>> r = amdgpu_vm_bo_update(adev, bo_va, false); >>>>> if (r) >>>>> goto error; >>>>> @@ -772,7 +778,8 @@ amdgpu_gem_va_update_vm(struct amdgpu_device >>>> *adev, >>>>> if (r) >>>>> goto error; >>>>> >>>>> - if ((operation == AMDGPU_VA_OP_MAP || >>>>> + if (bo_va && >>>>> + (operation == AMDGPU_VA_OP_MAP || >>>>> operation == AMDGPU_VA_OP_REPLACE) && >>>> >>>> Something else must be broken here. We already check operation == >>>> AMDGPU_VA_OP_MAP or AMDGPU_VA_OP_REPLACE. >>>> >>>> That should be enough to Ensure that bo_va isn't NULL. >>> >>> Thanks for the review and the clarification. please correct me if I am >>> mistaken. >>> >>> - For MAP/REPLACE, we are working with a real buffer (BO) >> >> Not quite for PRT the BO is NULL. > > Thanks for the clarification. > > Ah, got it. > > So for PRT, bo_va is valid but bo_va->bo is NULL. > >> >>> - So bo_va should always be valid in those cases >>> - bo_va should be NULL only for CLEAR or PRT, where no real buffer is >>> used >> >> No, bo_va is a valid pointer for MAP/REPLACE, but for clear it is NULL. >> >> For PRT bo_va->bo is NULL. > > Thanks, that helps clarify. > > So: > - CLEAR → bo_va = NULL > - PRT → bo_va is valid, but bo_va->bo = NULL > - MAP/REPLACE → bo_va should always be valid > > I was incorrectly treating PRT similar to CLEAR earlier. > >> >>> >>> Based on this, it seems the issue may not be in >>> amdgpu_gem_va_update_vm(), but earlier in amdgpu_gem_va_ioctl(), in the part >> where we decide: >>> >>> - which buffer to use (abo) >>> - and where it is mapped (bo_va) >>> >>> This is how I currently understand the flow: >>> >>> 1. First, we check the operation and flags >>> >>> - If it is NOT CLEAR and NOT PRT, >>> we take the normal path and get the real buffer (abo) from the >>> handle >> >> Ah, I see. Yeah, that is most likely wrong. >> >> We should use the special PRT bo_va in fpriv for that case, otherwise >> userqueues >> would have quite a problem. > > Got it, thanks. > > So for the PRT case, we should use fpriv->prt_va instead of treating it like > a NULL case. > > I will fix the ioctl() path to ensure PRT uses prt_va correctly. > > Based on this, I will focus on fixing the bo_va selection in > amdgpu_gem_va_ioctl(), > so that PRT uses prt_va and bo_va is not incorrectly NULL. > > Please let me know if this direction looks correct.
That sounds reasonable to me, yes. Thanks, Christian. > > Best regards, > Srini > >> >> Thanks, >> Christian.
