On 3/6/2026 12:41 AM, Lazar, Lijo wrote:
On 06-Mar-26 11:40 AM, Mario Limonciello wrote:
On 3/5/2026 11:07 PM, Lazar, Lijo wrote:
On 06-Mar-26 3:35 AM, Mario Limonciello wrote:
I found more case that a NULL version causes problems.
Add NULL checks as applicable.
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in
powergated state in some paths")
Signed-off-by: Mario Limonciello <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/
gpu/ drm/amd/amdgpu/amdgpu_device.c
index bc6f714e8763a..74cbe58484fe2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3463,6 +3463,9 @@ static void amdgpu_ip_block_hw_fini(struct
amdgpu_ip_block *ip_block)
struct amdgpu_device *adev = ip_block->adev;
int r;
+ if (!ip_block->version)
+ return;
+
ip block versions are set during discovery phase itself. This is a
very early init failure
Yes; this case is NPI system that not all blocks are in discovery yet.
System panics at bootup with NULL ptr deref in multiple places instead
of a clean recovery and keep fbdev. This patch series sorts it out.
Blocks not in discovery shouldn't be added to ip list or should be added
differently.
and ideally the fix should be not to call any fini for such an early
failure.
As an alternative to this series?
Yes, if it's a failure as early as in discovery stage, probably we
should skip amdgpu_device_fini_hw altogether.
I experimented some more and feel that the solution I came up with is
correct. There are valid versions of everything at this time (the failed
IP block isn't there at that time).
So how would you know to skip fini? I guess check asic_funcs not to be
NULL?
But then it's the same as the second commit is doing already.
Thanks,
Lijo
Thanks,
Lijo
if (!ip_block->version->funcs->hw_fini) {
dev_err(adev->dev, "hw_fini of IP block <%s> not defined\n",
ip_block->version->funcs->name);
@@ -3496,6 +3499,8 @@ static void
amdgpu_device_smu_fini_early(struct amdgpu_device *adev)
for (i = 0; i < adev->num_ip_blocks; i++) {
if (!adev->ip_blocks[i].status.hw)
continue;
+ if (!adev->ip_blocks[i].version)
+ continue;
if (adev->ip_blocks[i].version->type ==
AMD_IP_BLOCK_TYPE_SMC) {
amdgpu_ip_block_hw_fini(&adev->ip_blocks[i]);
break;