On 3/25/26 14:18, SHANMUGAM, SRINIVASAN wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
>> -----Original Message-----
>> From: Koenig, Christian <[email protected]>
>> Sent: Wednesday, March 25, 2026 5:39 PM
>> To: SHANMUGAM, SRINIVASAN <[email protected]>;
>> Deucher, Alexander <[email protected]>
>> Cc: [email protected]
>> Subject: Re: [PATCH] drm/amdgpu: Fix PRT VA handling and guard BO access in
>> VA update path
>>
>> On 3/25/26 12:58, Srinivasan Shanmugam wrote:
>>> PRT (Page Request Table) mappings are not backed by a real buffer.  In
>>
>> PRT (Partial Resident Texture).
>>
>>> this case, bo_va is valid, but bo_va->bo is NULL, meaning the mapping
>>> exists but does not point to any real buffer object.
>>>
>>> amdgpu_gem_va_ioctl() currently mixes CLEAR and PRT handling, which
>>> can result in incorrect bo_va selection. CLEAR should use bo_va =
>>> NULL, while PRT should use the special fpriv->prt_va mapping.
>>>
>>> Fix this by clearly selecting bo_va:
>>> - use fpriv->prt_va for PRT
>>> - use NULL only for CLEAR
>>> - use amdgpu_vm_bo_find() for normal BO mappings
>>>
>>> Also, amdgpu_gem_va_update_vm() accesses bo_va->base.bo without
>>> checking if it is NULL. This is not valid for PRT mappings.
>>>
>>> This keeps CLEAR, PRT, and normal cases separate and avoids invalid
>>> memory access.
>>>
>>> Cc: Alex Deucher <[email protected]>
>>> Suggested-by: Christian König <[email protected]>
>>> Signed-off-by: Srinivasan Shanmugam <[email protected]>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++++++++++++----
>>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index b0ba2bdaf43a..289d6b58b579 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -772,8 +772,10 @@ amdgpu_gem_va_update_vm(struct amdgpu_device
>> *adev,
>>>     if (r)
>>>             goto error;
>>>
>>> +   /* Only do BO-specific handling if this VA is backed by a real BO */
>>>     if ((operation == AMDGPU_VA_OP_MAP ||
>>>          operation == AMDGPU_VA_OP_REPLACE) &&
>>> +       bo_va->base.bo &&
>>
>> That is not correct. This branch here should also be taken for PRT mappings.
>>
>>>         !amdgpu_vm_is_bo_always_valid(vm, bo_va->base.bo)) {
>>>
>>>             /*
>>> @@ -909,15 +911,23 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
>> void *data,
>>>                     goto error;
>>>     }
>>>
>>> -   /* Resolve the BO-VA mapping for this VM/BO combination. */
>>> -   if (abo) {
>>> +   /* Resolve the BO-VA mapping for this VM/BO combination.
>>> +    *
>>> +    * Depending on the case decide bo_va:
>>> +    * - PRT: use special per-file prt_va (bo_va valid, but bo_va->bo == 
>>> NULL)
>>> +    * - CLEAR: no BO involved → bo_va = NULL
>>> +    * - Normal BO path: lookup mapping from VM
>>> +    */
>>> +   if (args->flags & AMDGPU_VM_PAGE_PRT) {
>>> +           bo_va = fpriv->prt_va;
>>> +   } else if (args->operation == AMDGPU_VA_OP_CLEAR) {
>>> +           bo_va = NULL;
>>> +   } else if (abo) {
>>>             bo_va = amdgpu_vm_bo_find(&fpriv->vm, abo);
>>>             if (!bo_va) {
>>>                     r = -ENOENT;
>>>                     goto error;
>>>             }
>>> -   } else if (args->operation != AMDGPU_VA_OP_CLEAR) {
>>> -           bo_va = fpriv->prt_va;
>>
>> That code already looks correct to me. I don't think we need to change 
>> anything
>> here.
>>
>> Where is your crash actually coming from?
> 
> Hi Christian,
> 
> The issue was observed in CI during IGT (amd_bo) runs, but I have not
> yet been able to reproduce it locally. Will continue investigating to
> identify the exact failing path.

That is most likely something completely different. As far as I can see the 
bo_va handling is correct.

> 
> Below is the crash signature for reference:
> 
> BUG: KASAN: null-ptr-deref in amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
> Write of size 4 at addr 0000000000000000 by task amd_bo

That sounds a bit like the fallout from Pikes patch:

    drm/amdgpu: fix syncobj leak for amdgpu_gem_va_ioctl()
    
    It requires freeing the syncobj and chain
    alloction resource.

Not sure what exactly goes wrong here.

Regards,
Christian.

> 
> RIP: amdgpu_gem_va_ioctl+0x385/0x1130 [amdgpu]
> CR2: 0000000000000000
> 
> I also tried to map the crash offset using gdb/objdump, but the results
> were not conclusive. The reported amdgpu_gem_va_ioctl+0x380 offset did
> not map cleanly to a single obvious source line
> 
> So at this point I can localize the crash to amdgpu_gem_va_ioctl(), but
> still need to identify the exact failing pointer/path.
> 
> 
> [  325.779102] 
> ==================================================================
> [  325.786483] BUG: KASAN: null-ptr-deref in amdgpu_gem_va_ioctl+0x380/0x1130 
> [amdgpu]
> [  325.795105] Write of size 4 at addr 0000000000000000 by task amd_bo/7893
> [  325.801997]
> [  325.803595] CPU: 12 UID: 0 PID: 7893 Comm: amd_bo Not tainted 
> 6.19.0-1314135.2.zuul.928a0cbbebc74c4f8d5a99a4d0a7ca55 #1 PREEMPT(voluntary)
> [  325.803602] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS 
> V1.03.B10 04/01/2019
> [  325.803606] Call Trace:
> [  325.803609]  <TASK>
> [  325.803612]  dump_stack_lvl+0x64/0x80
> [  325.803623]  kasan_report+0xb8/0xf0
> [  325.803631]  ? amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
> [  325.804427]  kasan_check_range+0x105/0x1b0
> [  325.804432]  amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
> [  325.805229]  ? __pfx_amdgpu_gem_create_ioctl+0x10/0x10 [amdgpu]
> [  325.806022]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [  325.806815]  ? __pfx___drm_dev_dbg+0x10/0x10 [drm]
> [  325.806894]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [  325.807686]  drm_ioctl_kernel+0x13d/0x2b0 [drm]
> [  325.807767]  ? __pfx_file_has_perm+0x10/0x10
> [  325.807777]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
> [  325.807857]  drm_ioctl+0x4be/0xae0 [drm]
> [  325.807936]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [  325.808728]  ? __pfx_sock_write_iter+0x10/0x10
> [  325.808737]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
> [  325.808816]  ? ioctl_has_perm.constprop.0.isra.0+0x2ad/0x490
> [  325.808823]  ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
> [  325.808827]  ? _raw_spin_lock_irqsave+0x86/0xd0
> [  325.808835]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [  325.808841]  amdgpu_drm_ioctl+0xce/0x180 [amdgpu]
> [  325.809622]  __x64_sys_ioctl+0x139/0x1c0
> [  325.809630]  do_syscall_64+0x64/0x880
> [  325.809638]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  325.809645] RIP: 0033:0x7f205fd12e1d
> [  325.809650] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 
> 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 
> 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
> [  325.809654] RSP: 002b:00007ffe9032b510 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000010
> [  325.809660] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
> 00007f205fd12e1d
> [  325.809663] RDX: 00007ffe9032b5b0 RSI: 00000000c0406448 RDI: 
> 0000000000000006
> [  325.809665] RBP: 00007ffe9032b560 R08: 0000000100000000 R09: 
> 000000000000000e
> [  325.809668] R10: 0000000000000000 R11: 0000000000000246 R12: 
> 00000000c0406448
> [  325.809670] R13: 0000000000000006 R14: 0000000000001000 R15: 
> 0000000000000001
> [  325.809675]  </TASK>
> [  325.809678] 
> ==================================================================
> [  326.029964] Disabling lock debugging due to kernel taint
> [  326.035486] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  326.042557] #PF: supervisor write access in kernel mode
> [  326.047887] #PF: error_code(0x0002) - not-present page
> [  326.053132] PGD 0 P4D 0
> [  326.055766] Oops: Oops: 0002 [#1] SMP KASAN NOPTI
> [  326.060577] CPU: 12 UID: 0 PID: 7893 Comm: amd_bo Tainted: G    B          
>      6.19.0-1314135.2.zuul.928a0cbbebc74c4f8d5a99a4d0a7ca55 #1 
> PREEMPT(voluntary)
> [  326.074815] Tainted: [B]=BAD_PAGE
> [  326.078233] Hardware name: TYAN B�8021G88V2HR-2T/7] RIP: 
> 0010:amdgpu_gem_va_ioctl+0x385/0x1130 [amdgpu]
> [  326.093279] Code: 00 00 75 aa 85 c0 74 a6 41 89 c7 31 ed 45 31 f6 48 89 ef 
> e8 dd bf 09 ce be 04 00 00 00 4c 89 f7 e8 90 0e 13 ce b8 ff ff ff ff <f0> 41 
> 0f c1 06 83 f8 01 0f 84 3c 05 00 00 85 c0 0f 8e 75 05 00 00
> [  326.112237] RSP: 0018:ffff88a0d02d7b60 EFLAGS: 00010246
> [  326.117568] RAX: 00000000ffffffff RBX: ffff88907f0c2848 RCX: 
> ffffffff8f43434a
> [  326.124813] RDX: fffffbfff2a16c0d RSI: 0000000000000008 RDI: 
> ffffffff950b6060
> [  326.132056] RBP: 0000000000000000 R08: 0000000000000001 R09: 
> fffffbfff2a16c0c
> [  326.139303] R10: ffffffff950b6067 R11: 0000000000000001 R12: 
> ffff88b1349d7778
> [  326.146548] R13: ffff88a0d02d7c00 R14: 0000000000000000 R15: 
> 0000000000000000
> [  326.153794] FS:  00007f205dbad940(0000) GS:ffff88c00aa09000(0000) 
> knlGS:0000000000000000
> [  326.162023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  326.167872] CR2: 0000000000000000 CR3: 000000207940e000 CR4: 
> 00000000003506f0
> [  326.175113] Call Trace:
> [  326.177661]  <TASK>
> [  326.179861]  ? __pfx_amdgpu_gem_create_ioctl+0x10/0x10 [amdgpu]
> [  326.186637]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [  326.193168]  ? __pfx___drm_dev_dbg+0x10/0x10 [drm]
> [  326.198141]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [  326.204608]  drm_ioctl_kernel+0x13d/0x2b0 [drm]
> [  326.209319]  ? __pfx_file_has_perm+0x10/0x10
> [  326.213696]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
> [  326.218934]  drm_ioctl+0x4be/0xae0 [drm]
> [  326.223109]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [  326.229576]  ? __pfx_sock_write_iter+0x10/0x10
> [  326.234130]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
> [  326.238752]  ? ioctl_has_perm.constprop.0.isra.0+0x2ad/0x490
> [  326.244518]  ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
> [  326.250630]  ? _raw_spin_lock_irqsave+0x86/0xd0
> [  326.255268]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [  326.260429]  amdgpu_drm_ioctl+0xce/0x180 [amdgpu]
> [  326.266018]  __x64_sys_ioctl+0x139/0x1c0
> [  326.270056]  do_syscall_64+0x64/0x880
> [  326.273827]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  326.278983] RIP: 0033:0x7f205fd12e1d
> [  326.282660] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 
> 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 
> 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
> [  326.301609] RSP: 002b:00007ffe9032b510 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000010
> [  326.309316] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
> 00007f205fd12e1d
> [  326.316560] RDX: 00007ffe9032b5b0 RSI: 00000000c0406448 RDI: 
> 0000000000000006
> [  326.323855] RBP: 00007ffe9032b560 R08: 0000000100000000 R09: 
> 000000000000000e
> [  326.331103] R10: 0000000000000000 R11: 0000000000000246 R12: 
> 00000000c0406448
> [  326.338347] R13: 0000000000000006 R14: 0000000000001000 R15: 
> 0000000000000001
> [  326.345595]  </TASK>
> 
> Thanks!
> Srini
> 
>>
>> Regards,
>> Christian.
>>
>>>     } else {
>>>             bo_va = NULL;
>>>     }
> 

Reply via email to