On Thu, Sep 18, 2025 at 5:35 PM Mario Limonciello
<mario.limoncie...@amd.com> wrote:
>
>
>
> On 9/18/2025 2:05 PM, Alex Deucher wrote:
> > On Thu, Sep 18, 2025 at 2:59 PM Mario Limonciello
> > <mario.limoncie...@amd.com> wrote:
> >>
> >> The MES set resources packet has an optional bit 'lr_compute_wa'
> >> which can be used for preventing MES hangs on long compute jobs.
> >>
> >> Set this bit by default.
> >>
> >> Co-developed-by: Yifan Zhang <yifan1.zh...@amd.com>
> >> Signed-off-by: Yifan Zhang <yifan1.zh...@amd.com>
> >> Signed-off-by: Mario Limonciello <mario.limoncie...@amd.com>
> >
> > Presumably this bit will be ignored on old firmwares?  If not, we'll
> > need a firmware version check.  Assuming this works correctly on old
> > firmwares,
>
> I'm assuming it does get ignored, but maybe Yifan can confirm it.

Might be good to add a FW version check anyway just in case and also
so that it's more obvious when the user has a new enough firmware to
contain the fix.

Alex

>
> > Acked-by: Alex Deucher <alexander.deuc...@amd.com>
> >
> > Alex
> >
> >> ---
> >> v2:
> >>   * drop module parameter
> >>   * add more description to commit text
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        | 2 ++
> >>   drivers/gpu/drm/amd/amdgpu/mes_v12_0.c        | 1 +
> >>   drivers/gpu/drm/amd/include/mes_v11_api_def.h | 3 ++-
> >>   drivers/gpu/drm/amd/include/mes_v12_api_def.h | 3 ++-
> >>   4 files changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
> >> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> >> index 3b91ea601add..540b514312b1 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> >> @@ -713,6 +713,8 @@ static int mes_v11_0_set_hw_resources(struct 
> >> amdgpu_mes *mes)
> >>          mes_set_hw_res_pkt.enable_reg_active_poll = 1;
> >>          mes_set_hw_res_pkt.enable_level_process_quantum_check = 1;
> >>          mes_set_hw_res_pkt.oversubscription_timer = 50;
> >> +       mes_set_hw_res_pkt.enable_lr_compute_wa = 1;
> >> +
> >>          if (amdgpu_mes_log_enable) {
> >>                  mes_set_hw_res_pkt.enable_mes_event_int_logging = 1;
> >>                  mes_set_hw_res_pkt.event_intr_history_gpu_mc_ptr =
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
> >> b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> >> index 998893dff08e..01266eef65cb 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> >> @@ -769,6 +769,7 @@ static int mes_v12_0_set_hw_resources(struct 
> >> amdgpu_mes *mes, int pipe)
> >>          mes_set_hw_res_pkt.use_different_vmid_compute = 1;
> >>          mes_set_hw_res_pkt.enable_reg_active_poll = 1;
> >>          mes_set_hw_res_pkt.enable_level_process_quantum_check = 1;
> >> +       mes_set_hw_res_pkt.enable_lr_compute_wa = 1;
> >>
> >>          /*
> >>           * Keep oversubscribe timer for sdma . When we have unmapped 
> >> doorbell
> >> diff --git a/drivers/gpu/drm/amd/include/mes_v11_api_def.h 
> >> b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
> >> index 15680c3f4970..ab1cfc92dbeb 100644
> >> --- a/drivers/gpu/drm/amd/include/mes_v11_api_def.h
> >> +++ b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
> >> @@ -238,7 +238,8 @@ union MESAPI_SET_HW_RESOURCES {
> >>                                  uint32_t enable_mes_sch_stb_log : 1;
> >>                                  uint32_t limit_single_process : 1;
> >>                                  uint32_t is_strix_tmz_wa_enabled  :1;
> >> -                               uint32_t reserved : 13;
> >> +                               uint32_t enable_lr_compute_wa : 1;
> >> +                               uint32_t reserved : 12;
> >>                          };
> >>                          uint32_t        uint32_t_all;
> >>                  };
> >> diff --git a/drivers/gpu/drm/amd/include/mes_v12_api_def.h 
> >> b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
> >> index c04bd351b250..69611c7e30e3 100644
> >> --- a/drivers/gpu/drm/amd/include/mes_v12_api_def.h
> >> +++ b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
> >> @@ -287,7 +287,8 @@ union MESAPI_SET_HW_RESOURCES {
> >>                                  uint32_t limit_single_process : 1;
> >>                                  uint32_t unmapped_doorbell_handling: 2;
> >>                                  uint32_t enable_mes_fence_int: 1;
> >> -                               uint32_t reserved : 10;
> >> +                               uint32_t enable_lr_compute_wa : 1;
> >> +                               uint32_t reserved : 9;
> >>                          };
> >>                          uint32_t uint32_all;
> >>                  };
> >> --
> >> 2.49.0
> >>
>

Reply via email to