[AMD Official Use Only - AMD Internal Distribution Only]
> -----Original Message-----
> From: Koenig, Christian <[email protected]>
> Sent: Wednesday, March 25, 2026 5:39 PM
> To: SHANMUGAM, SRINIVASAN <[email protected]>;
> Deucher, Alexander <[email protected]>
> Cc: [email protected]
> Subject: Re: [PATCH] drm/amdgpu: Fix PRT VA handling and guard BO access in
> VA update path
>
> On 3/25/26 12:58, Srinivasan Shanmugam wrote:
> > PRT (Page Request Table) mappings are not backed by a real buffer. In
>
> PRT (Partial Resident Texture).
>
> > this case, bo_va is valid, but bo_va->bo is NULL, meaning the mapping
> > exists but does not point to any real buffer object.
> >
> > amdgpu_gem_va_ioctl() currently mixes CLEAR and PRT handling, which
> > can result in incorrect bo_va selection. CLEAR should use bo_va =
> > NULL, while PRT should use the special fpriv->prt_va mapping.
> >
> > Fix this by clearly selecting bo_va:
> > - use fpriv->prt_va for PRT
> > - use NULL only for CLEAR
> > - use amdgpu_vm_bo_find() for normal BO mappings
> >
> > Also, amdgpu_gem_va_update_vm() accesses bo_va->base.bo without
> > checking if it is NULL. This is not valid for PRT mappings.
> >
> > This keeps CLEAR, PRT, and normal cases separate and avoids invalid
> > memory access.
> >
> > Cc: Alex Deucher <[email protected]>
> > Suggested-by: Christian König <[email protected]>
> > Signed-off-by: Srinivasan Shanmugam <[email protected]>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++++++++++++----
> > 1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > index b0ba2bdaf43a..289d6b58b579 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > @@ -772,8 +772,10 @@ amdgpu_gem_va_update_vm(struct amdgpu_device
> *adev,
> > if (r)
> > goto error;
> >
> > + /* Only do BO-specific handling if this VA is backed by a real BO */
> > if ((operation == AMDGPU_VA_OP_MAP ||
> > operation == AMDGPU_VA_OP_REPLACE) &&
> > + bo_va->base.bo &&
>
> That is not correct. This branch here should also be taken for PRT mappings.
>
> > !amdgpu_vm_is_bo_always_valid(vm, bo_va->base.bo)) {
> >
> > /*
> > @@ -909,15 +911,23 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
> void *data,
> > goto error;
> > }
> >
> > - /* Resolve the BO-VA mapping for this VM/BO combination. */
> > - if (abo) {
> > + /* Resolve the BO-VA mapping for this VM/BO combination.
> > + *
> > + * Depending on the case decide bo_va:
> > + * - PRT: use special per-file prt_va (bo_va valid, but bo_va->bo ==
> > NULL)
> > + * - CLEAR: no BO involved → bo_va = NULL
> > + * - Normal BO path: lookup mapping from VM
> > + */
> > + if (args->flags & AMDGPU_VM_PAGE_PRT) {
> > + bo_va = fpriv->prt_va;
> > + } else if (args->operation == AMDGPU_VA_OP_CLEAR) {
> > + bo_va = NULL;
> > + } else if (abo) {
> > bo_va = amdgpu_vm_bo_find(&fpriv->vm, abo);
> > if (!bo_va) {
> > r = -ENOENT;
> > goto error;
> > }
> > - } else if (args->operation != AMDGPU_VA_OP_CLEAR) {
> > - bo_va = fpriv->prt_va;
>
> That code already looks correct to me. I don't think we need to change
> anything
> here.
>
> Where is your crash actually coming from?
Hi Christian,
The issue was observed in CI during IGT (amd_bo) runs, but I have not
yet been able to reproduce it locally. Will continue investigating to
identify the exact failing path.
Below is the crash signature for reference:
BUG: KASAN: null-ptr-deref in amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
Write of size 4 at addr 0000000000000000 by task amd_bo
RIP: amdgpu_gem_va_ioctl+0x385/0x1130 [amdgpu]
CR2: 0000000000000000
I also tried to map the crash offset using gdb/objdump, but the results
were not conclusive. The reported amdgpu_gem_va_ioctl+0x380 offset did
not map cleanly to a single obvious source line
So at this point I can localize the crash to amdgpu_gem_va_ioctl(), but
still need to identify the exact failing pointer/path.
[ 325.779102]
==================================================================
[ 325.786483] BUG: KASAN: null-ptr-deref in amdgpu_gem_va_ioctl+0x380/0x1130
[amdgpu]
[ 325.795105] Write of size 4 at addr 0000000000000000 by task amd_bo/7893
[ 325.801997]
[ 325.803595] CPU: 12 UID: 0 PID: 7893 Comm: amd_bo Not tainted
6.19.0-1314135.2.zuul.928a0cbbebc74c4f8d5a99a4d0a7ca55 #1 PREEMPT(voluntary)
[ 325.803602] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS
V1.03.B10 04/01/2019
[ 325.803606] Call Trace:
[ 325.803609] <TASK>
[ 325.803612] dump_stack_lvl+0x64/0x80
[ 325.803623] kasan_report+0xb8/0xf0
[ 325.803631] ? amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
[ 325.804427] kasan_check_range+0x105/0x1b0
[ 325.804432] amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
[ 325.805229] ? __pfx_amdgpu_gem_create_ioctl+0x10/0x10 [amdgpu]
[ 325.806022] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[ 325.806815] ? __pfx___drm_dev_dbg+0x10/0x10 [drm]
[ 325.806894] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[ 325.807686] drm_ioctl_kernel+0x13d/0x2b0 [drm]
[ 325.807767] ? __pfx_file_has_perm+0x10/0x10
[ 325.807777] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
[ 325.807857] drm_ioctl+0x4be/0xae0 [drm]
[ 325.807936] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[ 325.808728] ? __pfx_sock_write_iter+0x10/0x10
[ 325.808737] ? __pfx_drm_ioctl+0x10/0x10 [drm]
[ 325.808816] ? ioctl_has_perm.constprop.0.isra.0+0x2ad/0x490
[ 325.808823] ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
[ 325.808827] ? _raw_spin_lock_irqsave+0x86/0xd0
[ 325.808835] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 325.808841] amdgpu_drm_ioctl+0xce/0x180 [amdgpu]
[ 325.809622] __x64_sys_ioctl+0x139/0x1c0
[ 325.809630] do_syscall_64+0x64/0x880
[ 325.809638] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 325.809645] RIP: 0033:0x7f205fd12e1d
[ 325.809650] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0
10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d
00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 325.809654] RSP: 002b:00007ffe9032b510 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 325.809660] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f205fd12e1d
[ 325.809663] RDX: 00007ffe9032b5b0 RSI: 00000000c0406448 RDI: 0000000000000006
[ 325.809665] RBP: 00007ffe9032b560 R08: 0000000100000000 R09: 000000000000000e
[ 325.809668] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0406448
[ 325.809670] R13: 0000000000000006 R14: 0000000000001000 R15: 0000000000000001
[ 325.809675] </TASK>
[ 325.809678]
==================================================================
[ 326.029964] Disabling lock debugging due to kernel taint
[ 326.035486] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 326.042557] #PF: supervisor write access in kernel mode
[ 326.047887] #PF: error_code(0x0002) - not-present page
[ 326.053132] PGD 0 P4D 0
[ 326.055766] Oops: Oops: 0002 [#1] SMP KASAN NOPTI
[ 326.060577] CPU: 12 UID: 0 PID: 7893 Comm: amd_bo Tainted: G B
6.19.0-1314135.2.zuul.928a0cbbebc74c4f8d5a99a4d0a7ca55 #1 PREEMPT(voluntary)
[ 326.074815] Tainted: [B]=BAD_PAGE
[ 326.078233] Hardware name: TYAN B�8021G88V2HR-2T/7] RIP:
0010:amdgpu_gem_va_ioctl+0x385/0x1130 [amdgpu]
[ 326.093279] Code: 00 00 75 aa 85 c0 74 a6 41 89 c7 31 ed 45 31 f6 48 89 ef
e8 dd bf 09 ce be 04 00 00 00 4c 89 f7 e8 90 0e 13 ce b8 ff ff ff ff <f0> 41 0f
c1 06 83 f8 01 0f 84 3c 05 00 00 85 c0 0f 8e 75 05 00 00
[ 326.112237] RSP: 0018:ffff88a0d02d7b60 EFLAGS: 00010246
[ 326.117568] RAX: 00000000ffffffff RBX: ffff88907f0c2848 RCX: ffffffff8f43434a
[ 326.124813] RDX: fffffbfff2a16c0d RSI: 0000000000000008 RDI: ffffffff950b6060
[ 326.132056] RBP: 0000000000000000 R08: 0000000000000001 R09: fffffbfff2a16c0c
[ 326.139303] R10: ffffffff950b6067 R11: 0000000000000001 R12: ffff88b1349d7778
[ 326.146548] R13: ffff88a0d02d7c00 R14: 0000000000000000 R15: 0000000000000000
[ 326.153794] FS: 00007f205dbad940(0000) GS:ffff88c00aa09000(0000)
knlGS:0000000000000000
[ 326.162023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 326.167872] CR2: 0000000000000000 CR3: 000000207940e000 CR4: 00000000003506f0
[ 326.175113] Call Trace:
[ 326.177661] <TASK>
[ 326.179861] ? __pfx_amdgpu_gem_create_ioctl+0x10/0x10 [amdgpu]
[ 326.186637] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[ 326.193168] ? __pfx___drm_dev_dbg+0x10/0x10 [drm]
[ 326.198141] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[ 326.204608] drm_ioctl_kernel+0x13d/0x2b0 [drm]
[ 326.209319] ? __pfx_file_has_perm+0x10/0x10
[ 326.213696] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
[ 326.218934] drm_ioctl+0x4be/0xae0 [drm]
[ 326.223109] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[ 326.229576] ? __pfx_sock_write_iter+0x10/0x10
[ 326.234130] ? __pfx_drm_ioctl+0x10/0x10 [drm]
[ 326.238752] ? ioctl_has_perm.constprop.0.isra.0+0x2ad/0x490
[ 326.244518] ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
[ 326.250630] ? _raw_spin_lock_irqsave+0x86/0xd0
[ 326.255268] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 326.260429] amdgpu_drm_ioctl+0xce/0x180 [amdgpu]
[ 326.266018] __x64_sys_ioctl+0x139/0x1c0
[ 326.270056] do_syscall_64+0x64/0x880
[ 326.273827] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 326.278983] RIP: 0033:0x7f205fd12e1d
[ 326.282660] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0
10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d
00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 326.301609] RSP: 002b:00007ffe9032b510 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 326.309316] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f205fd12e1d
[ 326.316560] RDX: 00007ffe9032b5b0 RSI: 00000000c0406448 RDI: 0000000000000006
[ 326.323855] RBP: 00007ffe9032b560 R08: 0000000100000000 R09: 000000000000000e
[ 326.331103] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0406448
[ 326.338347] R13: 0000000000000006 R14: 0000000000001000 R15: 0000000000000001
[ 326.345595] </TASK>
Thanks!
Srini
>
> Regards,
> Christian.
>
> > } else {
> > bo_va = NULL;
> > }