On Thu, Sep 18, 2025 at 5:35 PM Mario Limonciello <mario.limoncie...@amd.com> wrote: > > > > On 9/18/2025 2:05 PM, Alex Deucher wrote: > > On Thu, Sep 18, 2025 at 2:59 PM Mario Limonciello > > <mario.limoncie...@amd.com> wrote: > >> > >> The MES set resources packet has an optional bit 'lr_compute_wa' > >> which can be used for preventing MES hangs on long compute jobs. > >> > >> Set this bit by default. > >> > >> Co-developed-by: Yifan Zhang <yifan1.zh...@amd.com> > >> Signed-off-by: Yifan Zhang <yifan1.zh...@amd.com> > >> Signed-off-by: Mario Limonciello <mario.limoncie...@amd.com> > > > > Presumably this bit will be ignored on old firmwares? If not, we'll > > need a firmware version check. Assuming this works correctly on old > > firmwares, > > I'm assuming it does get ignored, but maybe Yifan can confirm it.
Might be good to add a FW version check anyway just in case and also so that it's more obvious when the user has a new enough firmware to contain the fix. Alex > > > Acked-by: Alex Deucher <alexander.deuc...@amd.com> > > > > Alex > > > >> --- > >> v2: > >> * drop module parameter > >> * add more description to commit text > >> --- > >> drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 ++ > >> drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 1 + > >> drivers/gpu/drm/amd/include/mes_v11_api_def.h | 3 ++- > >> drivers/gpu/drm/amd/include/mes_v12_api_def.h | 3 ++- > >> 4 files changed, 7 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > >> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > >> index 3b91ea601add..540b514312b1 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > >> @@ -713,6 +713,8 @@ static int mes_v11_0_set_hw_resources(struct > >> amdgpu_mes *mes) > >> mes_set_hw_res_pkt.enable_reg_active_poll = 1; > >> mes_set_hw_res_pkt.enable_level_process_quantum_check = 1; > >> mes_set_hw_res_pkt.oversubscription_timer = 50; > >> + mes_set_hw_res_pkt.enable_lr_compute_wa = 1; > >> + > >> if (amdgpu_mes_log_enable) { > >> mes_set_hw_res_pkt.enable_mes_event_int_logging = 1; > >> mes_set_hw_res_pkt.event_intr_history_gpu_mc_ptr = > >> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c > >> b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c > >> index 998893dff08e..01266eef65cb 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c > >> @@ -769,6 +769,7 @@ static int mes_v12_0_set_hw_resources(struct > >> amdgpu_mes *mes, int pipe) > >> mes_set_hw_res_pkt.use_different_vmid_compute = 1; > >> mes_set_hw_res_pkt.enable_reg_active_poll = 1; > >> mes_set_hw_res_pkt.enable_level_process_quantum_check = 1; > >> + mes_set_hw_res_pkt.enable_lr_compute_wa = 1; > >> > >> /* > >> * Keep oversubscribe timer for sdma . When we have unmapped > >> doorbell > >> diff --git a/drivers/gpu/drm/amd/include/mes_v11_api_def.h > >> b/drivers/gpu/drm/amd/include/mes_v11_api_def.h > >> index 15680c3f4970..ab1cfc92dbeb 100644 > >> --- a/drivers/gpu/drm/amd/include/mes_v11_api_def.h > >> +++ b/drivers/gpu/drm/amd/include/mes_v11_api_def.h > >> @@ -238,7 +238,8 @@ union MESAPI_SET_HW_RESOURCES { > >> uint32_t enable_mes_sch_stb_log : 1; > >> uint32_t limit_single_process : 1; > >> uint32_t is_strix_tmz_wa_enabled :1; > >> - uint32_t reserved : 13; > >> + uint32_t enable_lr_compute_wa : 1; > >> + uint32_t reserved : 12; > >> }; > >> uint32_t uint32_t_all; > >> }; > >> diff --git a/drivers/gpu/drm/amd/include/mes_v12_api_def.h > >> b/drivers/gpu/drm/amd/include/mes_v12_api_def.h > >> index c04bd351b250..69611c7e30e3 100644 > >> --- a/drivers/gpu/drm/amd/include/mes_v12_api_def.h > >> +++ b/drivers/gpu/drm/amd/include/mes_v12_api_def.h > >> @@ -287,7 +287,8 @@ union MESAPI_SET_HW_RESOURCES { > >> uint32_t limit_single_process : 1; > >> uint32_t unmapped_doorbell_handling: 2; > >> uint32_t enable_mes_fence_int: 1; > >> - uint32_t reserved : 10; > >> + uint32_t enable_lr_compute_wa : 1; > >> + uint32_t reserved : 9; > >> }; > >> uint32_t uint32_all; > >> }; > >> -- > >> 2.49.0 > >> >