On 3/24/26 17:43, SHANMUGAM, SRINIVASAN wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
>> -----Original Message-----
>> From: Koenig, Christian <[email protected]>
>> Sent: Tuesday, March 24, 2026 9:34 PM
>> To: SHANMUGAM, SRINIVASAN <[email protected]>;
>> Deucher, Alexander <[email protected]>
>> Cc: [email protected]
>> Subject: Re: [PATCH v2] drm/amdgpu: Fix NULL bo_va dereference in VA clear
>> path
>>
>> On 3/24/26 16:35, SHANMUGAM, SRINIVASAN wrote:
>>> [Public]
>>>
>>> Hi Christian,
>>>
>>>> -----Original Message-----
>>>> From: Koenig, Christian <[email protected]>
>>>> Sent: Tuesday, March 24, 2026 7:43 PM
>>>> To: SHANMUGAM, SRINIVASAN <[email protected]>;
>> Deucher,
>>>> Alexander <[email protected]>
>>>> Cc: [email protected]
>>>> Subject: Re: [PATCH v2] drm/amdgpu: Fix NULL bo_va dereference in VA
>>>> clear path
>>>>
>>>> On 3/24/26 14:57, Srinivasan Shanmugam wrote:
>>>>> amdgpu_gem_va_ioctl() can call amdgpu_gem_va_update_vm() with bo_va
>>>>> == NULL for AMDGPU_VA_OP_CLEAR.
>>>>>
>>>>> CLEAR operates on a VM address range and is not associated with a
>>>>> specific BO. In this case, the update helper should perform only
>>>>> VM-level updates and must not access BO-specific fields.
>>>>>
>>>>> Currently, bo_va may be dereferenced in the MAP/REPLACE handling
>>>>> paths without explicitly guarding against NULL, which can lead to a
>>>>> NULL pointer dereference when CLEAR is processed.
>>>>>
>>>>> Fix this by making amdgpu_gem_va_update_vm() explicitly handle bo_va
>>>>> == NULL:
>>>>> - Guard BO-specific accesses with bo_va checks
>>>>> - Warn if MAP/REPLACE ever reaches the helper with NULL bo_va
>>>>> - Keep VM update path unchanged for CLEAR
>>>>>
>>>>> This keeps CLEAR on the common update path while ensuring safe
>>>>> handling of NULL bo_va.
>>>>>
>>>>> Crash signature:
>>>>> [  325.716062] [IGT] amd_bo: executing [  325.779102]
>>>>>
>>>>
>> ============================================================
>>>> ======
>>>>> [  325.786483] BUG: KASAN: null-ptr-deref in
>>>>> amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu] [  325.795105] Write of
>>>>> size
>>>>> 4 at addr 0000000000000000 by task amd_bo/7893 [  325.801997] [
>>>>> 325.803595] CPU: 12 UID: 0 PID: 7893 Comm: amd_bo Not tainted
>>>>> 6.19.0-1314135.2.zuul.928a0cbbebc74c4f8d5a99a4d0a7ca55 #1
>>>>> PREEMPT(voluntary) [  325.803602] Hardware name: TYAN B8021G88V2HR-
>>>> 2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019 [  325.803606] Call Trace:
>>>>> [  325.803609]  <TASK>
>>>>> [  325.803612]  dump_stack_lvl+0x64/0x80 [  325.803623]
>>>>> kasan_report+0xb8/0xf0 [  325.803631]  ?
>>>>> amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu] [  325.804427]
>>>>> kasan_check_range+0x105/0x1b0 [  325.804432]
>>>>> amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu] [  325.805229]  ?
>>>>> __pfx_amdgpu_gem_create_ioctl+0x10/0x10 [amdgpu] [  325.806022]  ?
>>>>> __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu] [  325.806815]  ?
>>>>> __pfx___drm_dev_dbg+0x10/0x10 [drm] [  325.806894]  ?
>>>>> __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu] [  325.807686]
>>>>> drm_ioctl_kernel+0x13d/0x2b0 [drm] [  325.807767]  ?
>>>>> __pfx_file_has_perm+0x10/0x10 [  325.807777]  ?
>>>>> __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [  325.807857]
>>>>> drm_ioctl+0x4be/0xae0 [drm] [  325.807936]  ?
>>>>> __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu] [  325.808728]  ?
>>>>> __pfx_sock_write_iter+0x10/0x10 [  325.808737]  ?
>>>>> __pfx_drm_ioctl+0x10/0x10 [drm] [  325.808816]  ?
>>>>> ioctl_has_perm.constprop.0.isra.0+0x2ad/0x490
>>>>> [  325.808823]  ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
>>>>> [  325.808827]  ? _raw_spin_lock_irqsave+0x86/0xd0 [  325.808835]  ?
>>>>> __pfx__raw_spin_lock_irqsave+0x10/0x10
>>>>> [  325.808841]  amdgpu_drm_ioctl+0xce/0x180 [amdgpu] [  325.809622]
>>>>> __x64_sys_ioctl+0x139/0x1c0 [  325.809630]  do_syscall_64+0x64/0x880
>>>>> [ 325.809638]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>> [  325.809645] RIP: 0033:0x7f205fd12e1d [  325.809650] Code: 04 25
>>>>> 28
>>>>> 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45
>>>>> b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff
>>>>> ff
>>>>> 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 [  325.809654] RSP:
>>>>> 002b:00007ffe9032b510 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>> [
>>>>> 325.809660] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
>>>>> 00007f205fd12e1d [  325.809663] RDX: 00007ffe9032b5b0 RSI:
>>>>> 00000000c0406448 RDI: 0000000000000006 [  325.809665] RBP:
>>>>> 00007ffe9032b560 R08: 0000000100000000 R09: 000000000000000e [
>>>>> 325.809668] R10: 0000000000000000 R11: 0000000000000246 R12:
>>>>> 00000000c0406448 [  325.809670] R13: 0000000000000006 R14:
>>>>> 0000000000001000 R15: 0000000000000001 [  325.809675]  </TASK> [
>>>>> 325.809678]
>>>>>
>>>>
>> ============================================================
>>>> ======
>>>>>
>>>>> Fixes: dc54d3d1744d ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR
>> v2")
>>>>> Cc: Christian König <[email protected]>
>>>>> Cc: Alex Deucher <[email protected]>
>>>>> Signed-off-by: Srinivasan Shanmugam <[email protected]>
>>>>> ---
>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 11 +++++++++--
>>>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> index b0ba2bdaf43a..145cb222d5cf 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> @@ -759,9 +759,15 @@ amdgpu_gem_va_update_vm(struct amdgpu_device
>>>> *adev,
>>>>>     if (r)
>>>>>             goto error;
>>>>>
>>>>> -   /* For MAP/REPLACE we also need to update the BO mappings. */
>>>>> +    /* For MAP/REPLACE we also need to update the BO mappings.
>>>>> +     * CLEAR operates on the VM address range only and can come in with
>>>>> +     * bo_va == NULL.
>>>>> +     */
>>>>>     if (operation == AMDGPU_VA_OP_MAP ||
>>>>>         operation == AMDGPU_VA_OP_REPLACE) {
>>>>> +           if (WARN_ON_ONCE(!bo_va))
>>>>> +                   goto error;
>>>>> +
>>>>>             r = amdgpu_vm_bo_update(adev, bo_va, false);
>>>>>             if (r)
>>>>>                     goto error;
>>>>> @@ -772,7 +778,8 @@ amdgpu_gem_va_update_vm(struct amdgpu_device
>>>> *adev,
>>>>>     if (r)
>>>>>             goto error;
>>>>>
>>>>> -   if ((operation == AMDGPU_VA_OP_MAP ||
>>>>> +   if (bo_va &&
>>>>> +       (operation == AMDGPU_VA_OP_MAP ||
>>>>>          operation == AMDGPU_VA_OP_REPLACE) &&
>>>>
>>>> Something else must be broken here. We already check operation ==
>>>> AMDGPU_VA_OP_MAP or AMDGPU_VA_OP_REPLACE.
>>>>
>>>> That should be enough to Ensure that bo_va isn't NULL.
>>>
>>> Thanks for the review and the clarification. please correct me if I am 
>>> mistaken.
>>>
>>> - For MAP/REPLACE, we are working with a real buffer (BO)
>>
>> Not quite for PRT the BO is NULL.
> 
> Thanks for the clarification.
> 
> Ah, got it.
> 
> So for PRT, bo_va is valid but bo_va->bo is NULL.
> 
>>
>>> - So bo_va should always be valid in those cases
>>> - bo_va should be NULL only for CLEAR or PRT, where no real buffer is
>>> used
>>
>> No, bo_va is a valid pointer for MAP/REPLACE, but for clear it is NULL.
>>
>> For PRT bo_va->bo is NULL.
> 
> Thanks, that helps clarify.
> 
> So:
> - CLEAR → bo_va = NULL
> - PRT → bo_va is valid, but bo_va->bo = NULL
> - MAP/REPLACE → bo_va should always be valid
> 
> I was incorrectly treating PRT similar to CLEAR earlier.
> 
>>
>>>
>>> Based on this, it seems the issue may not be in
>>> amdgpu_gem_va_update_vm(), but earlier in amdgpu_gem_va_ioctl(), in the part
>> where we decide:
>>>
>>> - which buffer to use (abo)
>>> - and where it is mapped (bo_va)
>>>
>>> This is how I currently understand the flow:
>>>
>>> 1. First, we check the operation and flags
>>>
>>>    - If it is NOT CLEAR and NOT PRT,
>>>      we take the normal path and get the real buffer (abo) from the
>>> handle
>>
>> Ah, I see. Yeah, that is most likely wrong.
>>
>> We should use the special PRT bo_va in fpriv for that case, otherwise 
>> userqueues
>> would have quite a problem.
> 
> Got it, thanks.
> 
> So for the PRT case, we should use fpriv->prt_va instead of treating it like 
> a NULL case.
> 
> I will fix the ioctl() path to ensure PRT uses prt_va correctly.
> 
> Based on this, I will focus on fixing the bo_va selection in 
> amdgpu_gem_va_ioctl(),
> so that PRT uses prt_va and bo_va is not incorrectly NULL.
> 
> Please let me know if this direction looks correct.

That sounds reasonable to me, yes.

Thanks,
Christian.

> 
> Best regards,
> Srini
> 
>>
>> Thanks,
>> Christian.

Reply via email to