[AMD Official Use Only - General]
_____________________________________________ From: Zhang, Hawking <[email protected]> Sent: Wednesday, January 3, 2024 5:36 PM To: Wang, Yang(Kevin) <[email protected]>; [email protected] Cc: Zhou1, Tao <[email protected]>; Chai, Thomas <[email protected]> Subject: RE: [PATCH 02/14] drm/amdgpu: add ACA kernel hardware error log support [AMD Official Use Only - General] + dev_info(adev->dev, "[Hardware error] Accelerator Check Architecture events logged\n"); + /* plus 1 for output format, e.g: ACA[08/08]: xxxx */ + for (i = 0; i < ARRAY_SIZE(aca_regs); i++) + dev_info(adev->dev, "[Hardware error] ACA[%02d/%02d].%s=0x%016llx\n", + idx + 1, total, aca_regs[i].name, bank->regs[aca_regs[i].reg_idx]); We should keep the ACA log format simple since there are tools like crash dumper that grab these logs. How about formatting log as below dev_info(adev->dev, "[Hardware error] Accelerator Check Architecture (ACA) events logged\n"); dev_info(adev→dev, "[Hardware error] ACA.%s=0x%016llx\n"); In general, if the idx doesn't convey useful information, then just replace it with ACA.Reg. [Kevin]: Agree, will update it in next version. Best Regards, Kevin Thoughts? Regards, Hawking -----Original Message----- From: Wang, Yang(Kevin) <[email protected]<mailto:[email protected]>> Sent: Wednesday, January 3, 2024 16:02 To: [email protected]<mailto:[email protected]> Cc: Zhang, Hawking <[email protected]<mailto:[email protected]>>; Zhou1, Tao <[email protected]<mailto:[email protected]>>; Chai, Thomas <[email protected]<mailto:[email protected]>>; Wang, Yang(Kevin) <[email protected]<mailto:[email protected]>> Subject: [PATCH 02/14] drm/amdgpu: add ACA kernel hardware error log support add ACA kernel hardware error log support. Signed-off-by: Yang Wang <[email protected]<mailto:[email protected]>> --- drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 29 +++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c index 6a6f167b5380..cadeda64eded 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c @@ -100,6 +100,33 @@ static int aca_smu_get_valid_aca_count(struct amdgpu_device *adev, enum aca_erro return smu_funcs->get_valid_aca_count(adev, type, count); } +static struct aca_regs_dump { + const char *name; + int reg_idx; +} aca_regs[] = { + {"CONTROL", ACA_REG_IDX_CTL}, + {"STATUS", ACA_REG_IDX_STATUS}, + {"ADDR", ACA_REG_IDX_ADDR}, + {"MISC", ACA_REG_IDX_MISC0}, + {"CONFIG", ACA_REG_IDX_CONFG}, + {"IPID", ACA_REG_IDX_IPID}, + {"SYND", ACA_REG_IDX_SYND}, + {"DESTAT", ACA_REG_IDX_DESTAT}, + {"DEADDR", ACA_REG_IDX_DEADDR}, + {"CONTROL_MASK", ACA_REG_IDX_CTL_MASK}, +}; + +static void aca_smu_bank_dump(struct amdgpu_device *adev, int idx, int +total, struct aca_bank *bank) { + int i; + + dev_info(adev->dev, "[Hardware error] Accelerator Check Architecture events logged\n"); + /* plus 1 for output format, e.g: ACA[08/08]: xxxx */ + for (i = 0; i < ARRAY_SIZE(aca_regs); i++) + dev_info(adev->dev, "[Hardware error] ACA[%02d/%02d].%s=0x%016llx\n", + idx + 1, total, aca_regs[i].name, bank->regs[aca_regs[i].reg_idx]); +} + static int aca_smu_get_valid_aca_banks(struct amdgpu_device *adev, enum aca_error_type type, int start, int count, struct aca_banks *banks) @@ -137,6 +164,8 @@ static int aca_smu_get_valid_aca_banks(struct amdgpu_device *adev, enum aca_erro if (ret) return ret; + aca_smu_bank_dump(adev, i, count, &bank); + ret = aca_banks_add_bank(banks, &bank); if (ret) return ret; -- 2.34.1
