amd: Fix more cases of NULL pointer deref at fini

Lazar, Lijo Sat, 07 Mar 2026 22:11:26 -0800



On 08-Mar-26 1:48 AM, Mario Limonciello wrote:

On 3/6/2026 12:41 AM, Lazar, Lijo wrote:
On 06-Mar-26 11:40 AM, Mario Limonciello wrote:
On 3/5/2026 11:07 PM, Lazar, Lijo wrote:
On 06-Mar-26 3:35 AM, Mario Limonciello wrote:
I found more case that a NULL version causes problems.
Add NULL checks as applicable.
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses inpowergated state in some paths")
Signed-off-by: Mario Limonciello <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++
  1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/ drm/amd/amdgpu/amdgpu_device.c
index bc6f714e8763a..74cbe58484fe2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3463,6 +3463,9 @@ static void amdgpu_ip_block_hw_fini(structamdgpu_ip_block *ip_block)
      struct amdgpu_device *adev = ip_block->adev;
      int r;
+    if (!ip_block->version)
+        return;
+
ip block versions are set during discovery phase itself. This is avery early init failure
Yes; this case is NPI system that not all blocks are in discoveryyet. System panics at bootup with NULL ptr deref in multiple placesinstead of a clean recovery and keep fbdev. This patch series sortsit out.
Blocks not in discovery shouldn't be added to ip list or should beadded differently.
and ideally the fix should be not to call any fini for such an earlyfailure.
As an alternative to this series?
Yes, if it's a failure as early as in discovery stage, probably weshould skip amdgpu_device_fini_hw altogether.
I experimented some more and feel that the solution I came up with iscorrect. There are valid versions of everything at this time (the failedIP block isn't there at that time).

My understanding of the situation is this is any early exit since driverdoesn't recognize one IP block and hence cannot assign correspondingversion functions. Without discovery mechanism, the equivalent case isdriver not detecting the device id. In both cases, there shouldn't beany need to run through sw/hw fini sequences of ip block.

So how would you know to skip fini? I guess check asic_funcs not to beNULL?

One way is to undo the effect of set_ip_block within discovery itself.For ex: if there is discovery error, call amdgpu_ip_block_clear() orsimilar and remove any added ip blocks. num_ip_blocks will then be 0 andin such cases don't run through any unwind sequence (that shouldn'treally be needed then). That is the same case if driver is not abledetect a valid discovery binary blob also.


Thanks,
Lijo

But then it's the same as the second commit is doing already.


Thanks,
Lijo


Thanks,
Lijo

      if (!ip_block->version->funcs->hw_fini) {
          dev_err(adev->dev, "hw_fini of IP block <%s> not defined\n",
              ip_block->version->funcs->name);

@@ -3496,6 +3499,8 @@ static voidamdgpu_device_smu_fini_early(struct amdgpu_device *adev)

      for (i = 0; i < adev->num_ip_blocks; i++) {
          if (!adev->ip_blocks[i].status.hw)
              continue;
+        if (!adev->ip_blocks[i].version)
+            continue;

if (adev->ip_blocks[i].version->type ==AMD_IP_BLOCK_TYPE_SMC) {

              amdgpu_ip_block_hw_fini(&adev->ip_blocks[i]);
              break;

Re: [PATCH 1/2] drm/amd: Fix more cases of NULL pointer deref at fini

Reply via email to