Hi Tao, I'm thinking of checking ip block first because we might want to leverage this interrupt handler as a common entry for other RAS interrupt as well. In such case, it looks more clearer if we check the ip block first.
I agree with you either way looks good. You can have my RB for pushing the patches - We are always on the way to improve code quality. The series is Reviewed-by: Hawking Zhang <[email protected]> Regards, Hawking -----Original Message----- From: Zhou1, Tao <[email protected]> Sent: Friday, April 22, 2022 12:04 To: Zhang, Hawking <[email protected]>; [email protected]; Yang, Stanley <[email protected]>; Lazar, Lijo <[email protected]>; Ziya, Mohammad zafar <[email protected]>; Chai, Thomas <[email protected]> Subject: RE: [PATCH 1/3] drm/amdgpu: add RAS poison creation handler (v2) [AMD Official Use Only] Hi Hawking, The logic in my patch is: if (poison) if (umc) poison_creation_handler else poison_consumption_handler else if (umc) umc_handler else not supported I think your suggestion is: if (umc) if (poison) poison_creation_handler else umc_handler else if (poiosn) poison_consumption_handler else not supported Either way is OK for me, I don't think one approach is better than another. Regards, Tao > -----Original Message----- > From: Zhang, Hawking <[email protected]> > Sent: Thursday, April 21, 2022 10:54 PM > To: Zhou1, Tao <[email protected]>; [email protected]; > Yang, Stanley <[email protected]>; Lazar, Lijo > <[email protected]>; Ziya, Mohammad zafar > <[email protected]>; Chai, Thomas <[email protected]> > Cc: Zhou1, Tao <[email protected]> > Subject: RE: [PATCH 1/3] drm/amdgpu: add RAS poison creation handler > (v2) > > [AMD Official Use Only] > > Hi Tao, > > I was thinking more aggressive change - current > amdgpu_ras_interrupt_handler only serves as umc poison (poison mode) > or uncorrectable error handler (fue mode). > > We can still keep it as a unified entry point, but how about check ip > block first, then if it is umc, then check poison mode to decide > poison creation handling or legacy uncorrectable error handling. > > As discussed before, we'll optimize the poison creation handling > later. so keeping poison_creation_handler and legacy umc ue handler in > separated functions seems right direction to me. > > Thoughts? > > Regards, > Hawking > > -----Original Message----- > From: amd-gfx <[email protected]> On Behalf Of Tao > Zhou > Sent: Wednesday, April 20, 2022 19:30 > To: [email protected]; Zhang, Hawking > <[email protected]>; Yang, Stanley <[email protected]>; Lazar, > Lijo <[email protected]>; Ziya, Mohammad zafar > <[email protected]>; Chai, Thomas <[email protected]> > Cc: Zhou1, Tao <[email protected]> > Subject: [PATCH 1/3] drm/amdgpu: add RAS poison creation handler (v2) > > Prepare for the implementation of poison consumption handler. > > v2: separate umc handler from poison creation. > > Signed-off-by: Tao Zhou <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 70 > ++++++++++++++++--------- > 1 file changed, 44 insertions(+), 26 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index 4bbed76b79c8..22477f76913a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -1515,12 +1515,45 @@ static int amdgpu_ras_fs_fini(struct > amdgpu_device *adev) > /* ras fs end */ > > /* ih begin */ > +static void amdgpu_ras_interrupt_poison_creation_handler(struct > ras_manager *obj, > + struct amdgpu_iv_entry *entry) { > + dev_info(obj->adev->dev, > + "Poison is created, no user action is needed.\n"); } > + > +static void amdgpu_ras_interrupt_umc_handler(struct ras_manager *obj, > + struct amdgpu_iv_entry *entry) { > + struct ras_ih_data *data = &obj->ih_data; > + struct ras_err_data err_data = {0, 0, 0, NULL}; > + int ret; > + > + if (!data->cb) > + return; > + > + /* Let IP handle its data, maybe we need get the output > + * from the callback to update the error type/count, etc > + */ > + ret = data->cb(obj->adev, &err_data, entry); > + /* ue will trigger an interrupt, and in that case > + * we need do a reset to recovery the whole system. > + * But leave IP do that recovery, here we just dispatch > + * the error. > + */ > + if (ret == AMDGPU_RAS_SUCCESS) { > + /* these counts could be left as 0 if > + * some blocks do not count error number > + */ > + obj->err_data.ue_count += err_data.ue_count; > + obj->err_data.ce_count += err_data.ce_count; > + } > +} > + > static void amdgpu_ras_interrupt_handler(struct ras_manager *obj) { > struct ras_ih_data *data = &obj->ih_data; > struct amdgpu_iv_entry entry; > - int ret; > - struct ras_err_data err_data = {0, 0, 0, NULL}; > > while (data->rptr != data->wptr) { > rmb(); > @@ -1531,30 +1564,15 @@ static void > amdgpu_ras_interrupt_handler(struct > ras_manager *obj) > data->rptr = (data->aligned_element_size + > data->rptr) % data->ring_size; > > - if (data->cb) { > - if (amdgpu_ras_is_poison_mode_supported(obj->adev) && > - obj->head.block == AMDGPU_RAS_BLOCK__UMC) > - dev_info(obj->adev->dev, > - "Poison is created, no user > action is needed.\n"); > - else { > - /* Let IP handle its data, maybe we need get > the output > - * from the callback to udpate the error > type/count, etc > - */ > - memset(&err_data, 0, sizeof(err_data)); > - ret = data->cb(obj->adev, &err_data, &entry); > - /* ue will trigger an interrupt, and in that > case > - * we need do a reset to recovery the whole > system. > - * But leave IP do that recovery, here we > just dispatch > - * the error. > - */ > - if (ret == AMDGPU_RAS_SUCCESS) { > - /* these counts could be left as 0 if > - * some blocks do not count error > number > - */ > - obj->err_data.ue_count += > err_data.ue_count; > - obj->err_data.ce_count += > err_data.ce_count; > - } > - } > + if (amdgpu_ras_is_poison_mode_supported(obj->adev)) { > + if (obj->head.block == AMDGPU_RAS_BLOCK__UMC) > + > amdgpu_ras_interrupt_poison_creation_handler(obj, &entry); > + } else { > + if (obj->head.block == AMDGPU_RAS_BLOCK__UMC) > + amdgpu_ras_interrupt_umc_handler(obj, &entry); > + else > + dev_warn(obj->adev->dev, > + "No RAS interrupt handler for > +non-UMC block with poison disabled.\n"); > } > } > } > -- > 2.35.1 >
