On 2025-11-14 12:02, Russell, Kent wrote:
[AMD Official Use Only - AMD Internal Distribution Only]

-----Original Message-----
From: amd-gfx <[email protected]> On Behalf Of Andrew
Martin
Sent: Friday, November 14, 2025 9:41 AM
To: [email protected]
Cc: Martin, Andrew <[email protected]>
Subject: [PATCH] drm/amdkfd: FORWARD NULL

This patch fixes issues when the code moves forward with a potential
NULL pointer, without checking.

Signed-off-by: Andrew Martin <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 2 ++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c           | 6 +++++-
  drivers/gpu/drm/amd/amdkfd/kfd_process.c         | 3 +++
  3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
index 1ef758ac5076..71b7db5de69f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -104,6 +104,8 @@ static const char *amdkfd_fence_get_driver_name(struct
dma_fence *f)
  static const char *amdkfd_fence_get_timeline_name(struct dma_fence *f)
  {
       struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+     if (!fence)
+             return NULL;
Felix can hopefully confirm, but we may need to have something here, since the 
references here expect something. Like in
#define AMDGPU_JOB_GET_TIMELINE_NAME(job) \
          
job->base.s_fence->finished.ops->get_timeline_name(&job->base.s_fence->finished)

For amdgpu Job fences the above makes sense. But KFD fences are our evictions fences. There is no job associated with them.

I don't think the NULL check is needed here. to_amdgpu_amdkfd_fence returns NULL if the f is NULL or the fence is not an amdgpu_amdkfd_fence, based on the fence_ops. But we're in a fence_ops callback here that should only be called for amdgpu_amdkfd_fences.

That said, if you need a check to satisfy a static checker, I suggest this:

    return fence ? fence->timeline_name : NULL;



       return fence->timeline_name;
  }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index ba99e0f258ae..42fa137bdb2f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -517,7 +517,9 @@ int kfd_dbg_trap_set_flags(struct kfd_process *target,
uint32_t *flags)

       for (i = 0; i < target->n_pdds; i++) {
               struct kfd_topology_device *topo_dev =
-                             
kfd_topology_device_by_id(target->pdds[i]->dev->id);
+                     kfd_topology_device_by_id(target->pdds[i]->dev->id);
+             if (!topo_dev)
+                     continue;

This loop checks the capabilities of all the devices. If a device is not found, we should assume the worst and return an error, instead of just assuming that it'll be fine.


               uint32_t caps = topo_dev->node_props.capability;

               if (!(caps &
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED)
&&
@@ -1071,6 +1073,8 @@ int kfd_dbg_trap_device_snapshot(struct kfd_process
*target,
       for (i = 0; i < tmp_num_devices; i++) {
               struct kfd_process_device *pdd = target->pdds[i];
               struct kfd_topology_device *topo_dev =
kfd_topology_device_by_id(pdd->dev->id);
+             if (!topo_dev)
+                     continue;

I'd return an error here as well, because we obviously don't have accurate device info.



               device_info.gpu_id = pdd->dev->id;
               device_info.exception_status = pdd->exception_status;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index f5d173f1ca3b..f40d93f82ede 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1685,6 +1685,9 @@ int kfd_process_device_init_vm(struct
kfd_process_device *pdd,
       struct kfd_node *dev;
       int ret;

+     if (!pdd)
+             return -EINVAL;
+

We generally assume that functions are called with valid parameters. With that argument, we should probably remove the check for !drm_file below as well.

Regards,
  Felix


       if (!drm_file)
               return -EINVAL;
Probably easier to just combine the !pdd and !drm_file into the same check.

  Kent
--
2.43.0

Reply via email to