Public
> -----Original Message-----
> From: Sunil Khatri <[email protected]>
> Sent: Monday, June 1, 2026 12:24 PM
> To: Deucher, Alexander <[email protected]>; Koenig, Christian
> <[email protected]>
> Cc: [email protected]; Khatri, Sunil <[email protected]>
> Subject: [PATCH 1/3] drm/amdgpu: validate the mes firmware version for
> gfx11
>
> MES fw should report the fw version same either read from the register or if
> read from the firmware ucode. That is not the case for MES firmware and we
> add a warning in case it is not same.
>
> Signed-off-by: Sunil Khatri <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 12 ++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 +
> drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 1 +
> 3 files changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> index c9467b26e42c..e5e1ceabcbc5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> @@ -781,6 +781,18 @@ int amdgpu_mes_init_microcode(struct
> amdgpu_device *adev, int pipe)
> return r;
> }
>
> +void amdgpu_mes_validate_fw_version(struct amdgpu_device *adev) {
> + u32 fw_from_ucode = adev-
> >mes.fw_version[AMDGPU_MES_SCHED_PIPE];
> + u32 fw_from_reg = adev->mes.sched_version &
> AMDGPU_MES_VERSION_MASK;
> +
> + if (fw_from_ucode != fw_from_reg)
> + dev_warn(adev->dev,
> + "MES FW version mismatch: ucode=0x%x
> register=0x%x\n",
> + fw_from_ucode, fw_from_reg);
Rather than a warning, maybe just dev_info? I'm concerned this will generates
a lot of useless bug reports. There's nothing actually wrong with the
firmware, the version is just wrong in the ucode binary. Perhaps reword the
message to say something like: "firmware reports incorrect version in ucode
binary (0x%x vs. 0x%x)."
Alex
> +}
> +
> +
> bool amdgpu_mes_suspend_resume_all_supported(struct amdgpu_device
> *adev) {
> uint32_t mes_rev = adev->mes.sched_version &
> AMDGPU_MES_VERSION_MASK; diff --git
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> index 93990d4990f2..fdd06a17520a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> @@ -441,6 +441,7 @@ struct amdgpu_mes_funcs {
> (adev)->mes.kiq_hw_fini((adev), (xcc_id))
>
> int amdgpu_mes_init_microcode(struct amdgpu_device *adev, int pipe);
> +void amdgpu_mes_validate_fw_version(struct amdgpu_device *adev);
> int amdgpu_mes_init(struct amdgpu_device *adev); void
> amdgpu_mes_fini(struct amdgpu_device *adev);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> index a926a330700e..0db378d126fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> @@ -1686,6 +1686,7 @@ static int mes_v11_0_hw_init(struct
> amdgpu_ip_block *ip_block)
> if (r)
> goto failure;
>
> + amdgpu_mes_validate_fw_version(adev);
> out:
> /*
> * Disable KIQ ring usage from the driver once MES is enabled.
> --
> 2.34.1