[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Hawking Zhang <[email protected]>

Per discussion, please have a separated patch to replace all the "DRM_INFO" 
with "dev_info" in per IP query_ras_error_count callback function so that we 
will have clear picture on which errors are from which nodes when harvest all 
the RAS errors in one gpu recovery worker.

Regards,
Hawking
From: Clements, John <[email protected]>
Sent: Tuesday, April 7, 2020 11:03
To: [email protected]; Zhang, Hawking <[email protected]>; 
Chen, Guchun <[email protected]>; Li, Dennis <[email protected]>; Zhou1, Tao 
<[email protected]>
Subject: [PATCH] drm/amdgpu: resolve mGPU RAS query instability


[AMD Official Use Only - Internal Distribution Only]

Submitting patch to resolve issue when upon receiving an uncorrectable ras 
error, RAS ISR gets triggered on all GPU node creating a race condition between 
querying the RAS errors and entering the GPU reset sequence
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to