On 2025-11-14 12:02, Russell, Kent wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
-----Original Message-----
From: amd-gfx <[email protected]> On Behalf Of Andrew
Martin
Sent: Friday, November 14, 2025 9:41 AM
To: [email protected]
Cc: Martin, Andrew <[email protected]>
Subject: [PATCH] drm/amdkfd: FORWARD NULL
This patch fixes issues when the code moves forward with a potential
NULL pointer, without checking.
Signed-off-by: Andrew Martin <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 2 ++
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 +++++-
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 +++
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
index 1ef758ac5076..71b7db5de69f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -104,6 +104,8 @@ static const char *amdkfd_fence_get_driver_name(struct
dma_fence *f)
static const char *amdkfd_fence_get_timeline_name(struct dma_fence *f)
{
struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+ if (!fence)
+ return NULL;
Felix can hopefully confirm, but we may need to have something here, since the
references here expect something. Like in
#define AMDGPU_JOB_GET_TIMELINE_NAME(job) \
job->base.s_fence->finished.ops->get_timeline_name(&job->base.s_fence->finished)
For amdgpu Job fences the above makes sense. But KFD fences are our
evictions fences. There is no job associated with them.
I don't think the NULL check is needed here. to_amdgpu_amdkfd_fence
returns NULL if the f is NULL or the fence is not an
amdgpu_amdkfd_fence, based on the fence_ops. But we're in a fence_ops
callback here that should only be called for amdgpu_amdkfd_fences.
That said, if you need a check to satisfy a static checker, I suggest this:
return fence ? fence->timeline_name : NULL;
return fence->timeline_name;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index ba99e0f258ae..42fa137bdb2f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -517,7 +517,9 @@ int kfd_dbg_trap_set_flags(struct kfd_process *target,
uint32_t *flags)
for (i = 0; i < target->n_pdds; i++) {
struct kfd_topology_device *topo_dev =
-
kfd_topology_device_by_id(target->pdds[i]->dev->id);
+ kfd_topology_device_by_id(target->pdds[i]->dev->id);
+ if (!topo_dev)
+ continue;
This loop checks the capabilities of all the devices. If a device is not
found, we should assume the worst and return an error, instead of just
assuming that it'll be fine.
uint32_t caps = topo_dev->node_props.capability;
if (!(caps &
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED)
&&
@@ -1071,6 +1073,8 @@ int kfd_dbg_trap_device_snapshot(struct kfd_process
*target,
for (i = 0; i < tmp_num_devices; i++) {
struct kfd_process_device *pdd = target->pdds[i];
struct kfd_topology_device *topo_dev =
kfd_topology_device_by_id(pdd->dev->id);
+ if (!topo_dev)
+ continue;
I'd return an error here as well, because we obviously don't have
accurate device info.
device_info.gpu_id = pdd->dev->id;
device_info.exception_status = pdd->exception_status;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index f5d173f1ca3b..f40d93f82ede 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1685,6 +1685,9 @@ int kfd_process_device_init_vm(struct
kfd_process_device *pdd,
struct kfd_node *dev;
int ret;
+ if (!pdd)
+ return -EINVAL;
+
We generally assume that functions are called with valid parameters.
With that argument, we should probably remove the check for !drm_file
below as well.
Regards,
Felix
if (!drm_file)
return -EINVAL;
Probably easier to just combine the !pdd and !drm_file into the same check.
Kent
--
2.43.0