On 20-Mar-26 3:25 PM, Jesse.Zhang wrote:
Some platforms report an invalidly large IP discovery TMR size, which leads
amdgpu_discovery_init() to attempt a large kmalloc allocation and trigger
page allocator warnings/failures during probe.

Observed log excerpt:
   WARNING: mm/page_alloc.c:5216 at __alloc_frozen_pages_noprof+0x29e/0x340
   ...
   ___kmalloc_large_node+0xf2/0x130
   __kmalloc_noprof+0x442/0x6b0
   amdgpu_discovery_init+0x161/0xa00 [amdgpu]
  Fatal error during GPU init
  probe with driver amdgpu failed with error -12

This looks like a different issue. Do you have a trace of which path it takes and the value seen?

Thanks,
Lijo


Fix by:
- validating discovery size and falling back to DISCOVERY_TMR_SIZE when
   size is zero or out of expected range;
- using kvzalloc() for discovery buffer allocation to avoid high-order
   contiguous-page allocation failures;
- using kvfree() on all release paths.

Signed-off-by: Jesse Zhang <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 19 ++++++++++++++++---
  1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 5a4e63e1ad93..a6b49378c495 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -329,7 +329,20 @@ static int amdgpu_discovery_get_tmr_info(struct 
amdgpu_device *adev,
                }
        }
  out:
-       adev->discovery.bin = kzalloc(adev->discovery.size, GFP_KERNEL);
+       if (!adev->discovery.size || adev->discovery.size > DISCOVERY_TMR_SIZE) 
{
+               dev_warn(adev->dev,
+                        "invalid discovery size 0x%x, fallback to default 
0x%x\n",
+                        adev->discovery.size, DISCOVERY_TMR_SIZE);
+               /*
+                * Some platforms may expose garbage TMR size through 
scratch/ACPI.
+                * Fall back to legacy layout in VRAM when available.
+                */
+               if (!*is_tmr_in_sysmem && vram_size)
+                       adev->discovery.offset = (vram_size << 20) - 
DISCOVERY_TMR_OFFSET;
+               adev->discovery.size = DISCOVERY_TMR_SIZE;
+       }
+
+       adev->discovery.bin = kvzalloc(adev->discovery.size, GFP_KERNEL);
        if (!adev->discovery.bin)
                return -ENOMEM;
        adev->discovery.debugfs_blob.data = adev->discovery.bin;
@@ -694,7 +707,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
        return 0;
out:
-       kfree(adev->discovery.bin);
+       kvfree(adev->discovery.bin);
        adev->discovery.bin = NULL;
        if ((amdgpu_discovery != 2) &&
            (RREG32(mmIP_DISCOVERY_VERSION) == 4))
@@ -707,7 +720,7 @@ static void amdgpu_discovery_sysfs_fini(struct 
amdgpu_device *adev);
  void amdgpu_discovery_fini(struct amdgpu_device *adev)
  {
        amdgpu_discovery_sysfs_fini(adev);
-       kfree(adev->discovery.bin);
+       kvfree(adev->discovery.bin);
        adev->discovery.bin = NULL;
  }

Reply via email to