[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Koenig, Christian <[email protected]>
> Sent: Wednesday, March 25, 2026 5:39 PM
> To: SHANMUGAM, SRINIVASAN <[email protected]>;
> Deucher, Alexander <[email protected]>
> Cc: [email protected]
> Subject: Re: [PATCH] drm/amdgpu: Fix PRT VA handling and guard BO access in
> VA update path
>
> On 3/25/26 12:58, Srinivasan Shanmugam wrote:
> > PRT (Page Request Table) mappings are not backed by a real buffer.  In
>
> PRT (Partial Resident Texture).
>
> > this case, bo_va is valid, but bo_va->bo is NULL, meaning the mapping
> > exists but does not point to any real buffer object.
> >
> > amdgpu_gem_va_ioctl() currently mixes CLEAR and PRT handling, which
> > can result in incorrect bo_va selection. CLEAR should use bo_va =
> > NULL, while PRT should use the special fpriv->prt_va mapping.
> >
> > Fix this by clearly selecting bo_va:
> > - use fpriv->prt_va for PRT
> > - use NULL only for CLEAR
> > - use amdgpu_vm_bo_find() for normal BO mappings
> >
> > Also, amdgpu_gem_va_update_vm() accesses bo_va->base.bo without
> > checking if it is NULL. This is not valid for PRT mappings.
> >
> > This keeps CLEAR, PRT, and normal cases separate and avoids invalid
> > memory access.
> >
> > Cc: Alex Deucher <[email protected]>
> > Suggested-by: Christian König <[email protected]>
> > Signed-off-by: Srinivasan Shanmugam <[email protected]>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++++++++++++----
> >  1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > index b0ba2bdaf43a..289d6b58b579 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > @@ -772,8 +772,10 @@ amdgpu_gem_va_update_vm(struct amdgpu_device
> *adev,
> >     if (r)
> >             goto error;
> >
> > +   /* Only do BO-specific handling if this VA is backed by a real BO */
> >     if ((operation == AMDGPU_VA_OP_MAP ||
> >          operation == AMDGPU_VA_OP_REPLACE) &&
> > +       bo_va->base.bo &&
>
> That is not correct. This branch here should also be taken for PRT mappings.
>
> >         !amdgpu_vm_is_bo_always_valid(vm, bo_va->base.bo)) {
> >
> >             /*
> > @@ -909,15 +911,23 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
> void *data,
> >                     goto error;
> >     }
> >
> > -   /* Resolve the BO-VA mapping for this VM/BO combination. */
> > -   if (abo) {
> > +   /* Resolve the BO-VA mapping for this VM/BO combination.
> > +    *
> > +    * Depending on the case decide bo_va:
> > +    * - PRT: use special per-file prt_va (bo_va valid, but bo_va->bo == 
> > NULL)
> > +    * - CLEAR: no BO involved → bo_va = NULL
> > +    * - Normal BO path: lookup mapping from VM
> > +    */
> > +   if (args->flags & AMDGPU_VM_PAGE_PRT) {
> > +           bo_va = fpriv->prt_va;
> > +   } else if (args->operation == AMDGPU_VA_OP_CLEAR) {
> > +           bo_va = NULL;
> > +   } else if (abo) {
> >             bo_va = amdgpu_vm_bo_find(&fpriv->vm, abo);
> >             if (!bo_va) {
> >                     r = -ENOENT;
> >                     goto error;
> >             }
> > -   } else if (args->operation != AMDGPU_VA_OP_CLEAR) {
> > -           bo_va = fpriv->prt_va;
>
> That code already looks correct to me. I don't think we need to change 
> anything
> here.
>
> Where is your crash actually coming from?

Hi Christian,

The issue was observed in CI during IGT (amd_bo) runs, but I have not
yet been able to reproduce it locally. Will continue investigating to
identify the exact failing path.

Below is the crash signature for reference:

BUG: KASAN: null-ptr-deref in amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
Write of size 4 at addr 0000000000000000 by task amd_bo

RIP: amdgpu_gem_va_ioctl+0x385/0x1130 [amdgpu]
CR2: 0000000000000000

I also tried to map the crash offset using gdb/objdump, but the results
were not conclusive. The reported amdgpu_gem_va_ioctl+0x380 offset did
not map cleanly to a single obvious source line

So at this point I can localize the crash to amdgpu_gem_va_ioctl(), but
still need to identify the exact failing pointer/path.


[  325.779102] 
==================================================================
[  325.786483] BUG: KASAN: null-ptr-deref in amdgpu_gem_va_ioctl+0x380/0x1130 
[amdgpu]
[  325.795105] Write of size 4 at addr 0000000000000000 by task amd_bo/7893
[  325.801997]
[  325.803595] CPU: 12 UID: 0 PID: 7893 Comm: amd_bo Not tainted 
6.19.0-1314135.2.zuul.928a0cbbebc74c4f8d5a99a4d0a7ca55 #1 PREEMPT(voluntary)
[  325.803602] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS 
V1.03.B10 04/01/2019
[  325.803606] Call Trace:
[  325.803609]  <TASK>
[  325.803612]  dump_stack_lvl+0x64/0x80
[  325.803623]  kasan_report+0xb8/0xf0
[  325.803631]  ? amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
[  325.804427]  kasan_check_range+0x105/0x1b0
[  325.804432]  amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
[  325.805229]  ? __pfx_amdgpu_gem_create_ioctl+0x10/0x10 [amdgpu]
[  325.806022]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[  325.806815]  ? __pfx___drm_dev_dbg+0x10/0x10 [drm]
[  325.806894]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[  325.807686]  drm_ioctl_kernel+0x13d/0x2b0 [drm]
[  325.807767]  ? __pfx_file_has_perm+0x10/0x10
[  325.807777]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
[  325.807857]  drm_ioctl+0x4be/0xae0 [drm]
[  325.807936]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[  325.808728]  ? __pfx_sock_write_iter+0x10/0x10
[  325.808737]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
[  325.808816]  ? ioctl_has_perm.constprop.0.isra.0+0x2ad/0x490
[  325.808823]  ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
[  325.808827]  ? _raw_spin_lock_irqsave+0x86/0xd0
[  325.808835]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  325.808841]  amdgpu_drm_ioctl+0xce/0x180 [amdgpu]
[  325.809622]  __x64_sys_ioctl+0x139/0x1c0
[  325.809630]  do_syscall_64+0x64/0x880
[  325.809638]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  325.809645] RIP: 0033:0x7f205fd12e1d
[  325.809650] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 
10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 
00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[  325.809654] RSP: 002b:00007ffe9032b510 EFLAGS: 00000246 ORIG_RAX: 
0000000000000010
[  325.809660] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f205fd12e1d
[  325.809663] RDX: 00007ffe9032b5b0 RSI: 00000000c0406448 RDI: 0000000000000006
[  325.809665] RBP: 00007ffe9032b560 R08: 0000000100000000 R09: 000000000000000e
[  325.809668] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0406448
[  325.809670] R13: 0000000000000006 R14: 0000000000001000 R15: 0000000000000001
[  325.809675]  </TASK>
[  325.809678] 
==================================================================
[  326.029964] Disabling lock debugging due to kernel taint
[  326.035486] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  326.042557] #PF: supervisor write access in kernel mode
[  326.047887] #PF: error_code(0x0002) - not-present page
[  326.053132] PGD 0 P4D 0
[  326.055766] Oops: Oops: 0002 [#1] SMP KASAN NOPTI
[  326.060577] CPU: 12 UID: 0 PID: 7893 Comm: amd_bo Tainted: G    B            
   6.19.0-1314135.2.zuul.928a0cbbebc74c4f8d5a99a4d0a7ca55 #1 PREEMPT(voluntary)
[  326.074815] Tainted: [B]=BAD_PAGE
[  326.078233] Hardware name: TYAN B�8021G88V2HR-2T/7] RIP: 
0010:amdgpu_gem_va_ioctl+0x385/0x1130 [amdgpu]
[  326.093279] Code: 00 00 75 aa 85 c0 74 a6 41 89 c7 31 ed 45 31 f6 48 89 ef 
e8 dd bf 09 ce be 04 00 00 00 4c 89 f7 e8 90 0e 13 ce b8 ff ff ff ff <f0> 41 0f 
c1 06 83 f8 01 0f 84 3c 05 00 00 85 c0 0f 8e 75 05 00 00
[  326.112237] RSP: 0018:ffff88a0d02d7b60 EFLAGS: 00010246
[  326.117568] RAX: 00000000ffffffff RBX: ffff88907f0c2848 RCX: ffffffff8f43434a
[  326.124813] RDX: fffffbfff2a16c0d RSI: 0000000000000008 RDI: ffffffff950b6060
[  326.132056] RBP: 0000000000000000 R08: 0000000000000001 R09: fffffbfff2a16c0c
[  326.139303] R10: ffffffff950b6067 R11: 0000000000000001 R12: ffff88b1349d7778
[  326.146548] R13: ffff88a0d02d7c00 R14: 0000000000000000 R15: 0000000000000000
[  326.153794] FS:  00007f205dbad940(0000) GS:ffff88c00aa09000(0000) 
knlGS:0000000000000000
[  326.162023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  326.167872] CR2: 0000000000000000 CR3: 000000207940e000 CR4: 00000000003506f0
[  326.175113] Call Trace:
[  326.177661]  <TASK>
[  326.179861]  ? __pfx_amdgpu_gem_create_ioctl+0x10/0x10 [amdgpu]
[  326.186637]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[  326.193168]  ? __pfx___drm_dev_dbg+0x10/0x10 [drm]
[  326.198141]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[  326.204608]  drm_ioctl_kernel+0x13d/0x2b0 [drm]
[  326.209319]  ? __pfx_file_has_perm+0x10/0x10
[  326.213696]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
[  326.218934]  drm_ioctl+0x4be/0xae0 [drm]
[  326.223109]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[  326.229576]  ? __pfx_sock_write_iter+0x10/0x10
[  326.234130]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
[  326.238752]  ? ioctl_has_perm.constprop.0.isra.0+0x2ad/0x490
[  326.244518]  ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
[  326.250630]  ? _raw_spin_lock_irqsave+0x86/0xd0
[  326.255268]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  326.260429]  amdgpu_drm_ioctl+0xce/0x180 [amdgpu]
[  326.266018]  __x64_sys_ioctl+0x139/0x1c0
[  326.270056]  do_syscall_64+0x64/0x880
[  326.273827]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  326.278983] RIP: 0033:0x7f205fd12e1d
[  326.282660] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 
10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 
00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[  326.301609] RSP: 002b:00007ffe9032b510 EFLAGS: 00000246 ORIG_RAX: 
0000000000000010
[  326.309316] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f205fd12e1d
[  326.316560] RDX: 00007ffe9032b5b0 RSI: 00000000c0406448 RDI: 0000000000000006
[  326.323855] RBP: 00007ffe9032b560 R08: 0000000100000000 R09: 000000000000000e
[  326.331103] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0406448
[  326.338347] R13: 0000000000000006 R14: 0000000000001000 R15: 0000000000000001
[  326.345595]  </TASK>

Thanks!
Srini

>
> Regards,
> Christian.
>
> >     } else {
> >             bo_va = NULL;
> >     }

Reply via email to