[AMD Official Use Only - Internal Distribution Only] Reviewed-by: Hawking Zhang <[email protected]>
Per discussion, please have a separated patch to replace all the "DRM_INFO" with "dev_info" in per IP query_ras_error_count callback function so that we will have clear picture on which errors are from which nodes when harvest all the RAS errors in one gpu recovery worker. Regards, Hawking From: Clements, John <[email protected]> Sent: Tuesday, April 7, 2020 11:03 To: [email protected]; Zhang, Hawking <[email protected]>; Chen, Guchun <[email protected]>; Li, Dennis <[email protected]>; Zhou1, Tao <[email protected]> Subject: [PATCH] drm/amdgpu: resolve mGPU RAS query instability [AMD Official Use Only - Internal Distribution Only] Submitting patch to resolve issue when upon receiving an uncorrectable ras error, RAS ISR gets triggered on all GPU node creating a race condition between querying the RAS errors and entering the GPU reset sequence
_______________________________________________ amd-gfx mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/amd-gfx
