Hi Calvin,
On Mon, Mar 09, 2026 at 09:24:57PM -0700, Calvin Owens wrote:
> Commit e1b385726f7f ("drm/amd/display: Add additional checks for PSP
> footer size") introduced a use of an uninitialized stack variable
> in dm_dmub_sw_init() (region_params.bss_data_size).
>
> Interestingly, this seems to cause no issue on normal kernels. But when
> full LTO is enabled, it causes the compiler to "optimize" out huge
> swaths of amdgpu initialization code, and the driver is unusable:
Yeah, this appears to be a very unfortunate case of "clang encountered known
undefined behavior and stopped code generation", which we would like to
avoid but figuring out a proper upstreamable solution is hard. The most
recent attempt:
https://github.com/llvm/llvm-project/pull/146791
My guess is that LTO allows inlining of
dmub_srv_get_fw_meta_info_from_raw_fw() into dm_dmub_sw_init(), at which
point it can see that the result of accessing an uninitialized
region_params.bss_data_size will be used through
fw_meta_info_params.fw_bss_data and gives up generating the rest of the
function.
> amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP:
> version=0x07002F00
> amdgpu 0000:03:00.0: sw_init of IP block <dm> failed 5
> amdgpu 0000:03:00.0: amdgpu_device_ip_init failed
> amdgpu 0000:03:00.0: Fatal error during GPU init
>
> It surprises me that neither gcc nor clang emit a warning about this: I
> only found it by bisecting the LTO breakage.
gcc's -Wmaybe-uninitialized is disabled by default for the kernel but
even enabling it with KCFLAGS does not show an instance here, which I
find quite surprising... for clang, it is harder because the warning
happens early in the frontend where it might not be able to track a
value that well.
> Fix by using the old value for region_params.bss_data_size in place of
> the uninitialized reference, which makes amdgpu work with LTO again.
>
> Fixes: e1b385726f7f ("drm/amd/display: Add additional checks for PSP footer
> size")
> Signed-off-by: Calvin Owens <[email protected]>
> ---
> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index b3d6f2cd8ab6..e69e61163ae9 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -2554,7 +2554,7 @@ static int dm_dmub_sw_init(struct amdgpu_device *adev)
> fw_meta_info_params.fw_inst_const = adev->dm.dmub_fw->data +
>
> le32_to_cpu(hdr->header.ucode_array_offset_bytes) +
> PSP_HEADER_BYTES_256;
> - fw_meta_info_params.fw_bss_data = region_params.bss_data_size ?
> adev->dm.dmub_fw->data +
> + fw_meta_info_params.fw_bss_data = le32_to_cpu(hdr->bss_data_bytes) ?
> adev->dm.dmub_fw->data +
Maybe it would be better to use fw_meta_info_params.bss_data_size
instead of le32_to_cpu(hdr->bss_data_bytes)? Obviously it is the same
value but it would result in a smaller change. It seems likely that this
was just a copy and paste failure.
>
> le32_to_cpu(hdr->header.ucode_array_offset_bytes) +
> le32_to_cpu(hdr->inst_const_bytes) :
> NULL;
> fw_meta_info_params.custom_psp_footer_size = 0;
> --
> 2.47.3
>