On Monday 03/09 at 21:24 -0700, Calvin Owens wrote:
> Commit e1b385726f7f ("drm/amd/display: Add additional checks for PSP
> footer size") introduced a use of an uninitialized stack variable
> in dm_dmub_sw_init() (region_params.bss_data_size).
> 
> Interestingly, this seems to cause no issue on normal kernels. But when
> full LTO is enabled, it causes the compiler to "optimize" out huge
> swaths of amdgpu initialization code, and the driver is unusable:
> 
>     amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP: 
> version=0x07002F00
>     amdgpu 0000:03:00.0: sw_init of IP block <dm> failed 5
>     amdgpu 0000:03:00.0: amdgpu_device_ip_init failed
>     amdgpu 0000:03:00.0: Fatal error during GPU init

In case anybody wants to poke around, I uploaded the binaries here:

    https://github.com/jcalvinowens/lkml-debug/releases/tag/000001

You can see in the diff of the disassembly that the "missing" piece of
dm_sw_init() reappeared after reverting e1b38572:

    
https://github.com/jcalvinowens/lkml-debug/blob/main/amdgpu-lto/not-working-to-working.diff

This is my bisect log:

    bad: [1f318b96cc84d7c2ab792fcc0bfd42a7ca890681] Linux 7.0-rc3
    good: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
    bad: [1c2b4a4c2bcb950f182eeeb33d94b565607608cf] Merge tag 
'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
    good: [6589b3d76db2d6adbf8f2084c303fb24252a0dc6] Merge tag 'soc-dt-7.0' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
    bad: [a60f627cf4ab474aebf15f62c55eadabab9780da] Merge tag 
'amd-drm-next-6.20-2026-01-30' of https://gitlab.freedesktop.org/agd5f/linux 
into drm-next
    good: [83675851547e835c15252c601f41acf269c351d9] drm/xe: Cleanup unused 
header includes
    bad: [71573db5ad74b2087a4688cd1dda73ff082620f6] drm/amd/display: switch to 
drm_dbg_ macros instead of DRM_DEBUG_ variants
    bad: [3235a5b72317be613b69e22c3b2c9f2bec546253] drm/amdgpu: Update MES 
VM_CNTX_CNTL for XNACK off for GFX 12.1
    bad: [e1b73b64271d706079370b58b81292dafd373163] amdkfd: remove DIQ support
    good: [2634ef1b8c00207dde5101e926241957aa5652b8] drm/amdkfd: Fix PTE 
clearing during SVM unmap on GFX 12.1
    bad: [af441be8b75deb93ded51c54b9a2ba1e048b1c91] drm/amdgpu: add support for 
sdma v7_1
    good: [69249b477b95f91e56bb19ec53707253899458c4] drm/amd/display: Move 
dml2_validate to the non-FPU dml2_wrapper
    bad: [ec62b7ded978957ec74add4c1feccc986e2baeef] drm/amdkfd: Uninitialized 
and Unused variables
    good: [c7062be3380cb20c8b1c4a935a13f1848ead0719] drm/amd/display: Correct 
DSC padding accounting
    bad: [d28e92093ceffb424b9b0e36bbd391c83b1cfe78] drm/amd/display: [FW 
Promotion] Release 0.1.37.0
    bad: [e1b385726f7f7fc75b6cd3c2216430de8a625a2d] drm/amd/display: Add 
additional checks for PSP footer size
    first bad commit: [e1b385726f7f7fc75b6cd3c2216430de8a625a2d] 
drm/amd/display: Add additional checks for PSP footer size

Thanks,
Calvin

Reply via email to