[AMD Official Use Only - AMD Internal Distribution Only] > -----Original Message----- > From: Sun, Ce(Overlord) <[email protected]> > Sent: Wednesday, May 7, 2025 10:25 AM > To: [email protected] > Cc: Zhou1, Tao <[email protected]>; Zhang, Hawking > <[email protected]>; Sun, Ce(Overlord) <[email protected]> > Subject: [PATCH] drm/amdgpu: Modify the count method of defer error > > The number of newly added de counts and the number of newly added error > addresses remain consistent > > Signed-off-by: Ce Sun <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 1 + > drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 11 +++++++++-- > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h > index 857693bcd8d4..52fb71c4ce9d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h > @@ -130,6 +130,7 @@ struct amdgpu_umc { > > /* active mask for umc node instance */ > unsigned long active_mask; > + unsigned long err_addr_cnt; > }; > > int amdgpu_umc_ras_sw_init(struct amdgpu_device *adev); diff --git > a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c > b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c > index 0e404c074975..eb3f99dbbcd7 100644 > --- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c > @@ -262,6 +262,9 @@ static int umc_v12_0_convert_error_address(struct > amdgpu_device *adev, > soc_pa, channel_index, umc_inst); > } > > + if (err_data) > + adev->umc.err_addr_cnt = err_data->err_addr_cnt;
[Tao] better not to update it here. The error address can't be queried sometimes but error status and poison consumption are generated, adev->umc.err_addr_cnt should 0 in this case, but there is no chance to set it to 0 if it's configured here. > + > out: > return ret; > } > @@ -428,8 +431,12 @@ static int umc_v12_0_aca_bank_parser(struct aca_handle > *handle, struct aca_bank > bank->regs[ACA_REG_IDX_ADDR]); > > ext_error_code = ACA_REG__STATUS__ERRORCODEEXT(status); > - count = ext_error_code == 0 ? > - ACA_REG__MISC0__ERRCNT(bank- > >regs[ACA_REG_IDX_MISC0]) : 1ULL; > + if (umc_v12_0_is_deferred_error(adev, status)) > + count = ext_error_code == 0 ? > + adev->umc.err_addr_cnt / adev->umc.retire_unit : 1ULL; > + else > + count = ext_error_code == 0 ? > + ACA_REG__MISC0__ERRCNT(bank- > >regs[ACA_REG_IDX_MISC0]) : 1ULL; > > return aca_error_cache_log_bank_error(handle, &info, err_type, count); > } > -- > 2.34.1
