[AMD Public Use] Thanks for the clarifying, Dennis. So this is kind of race condition between normal GPU reset and ras GPU reset. I 'm fine with the change. The patch is
Reviewed-by: Hawking Zhang <[email protected]> Regards, Hawking -----Original Message----- From: Li, Dennis <[email protected]> Sent: Wednesday, October 14, 2020 18:08 To: Zhang, Hawking <[email protected]>; [email protected]; Deucher, Alexander <[email protected]>; Kuehling, Felix <[email protected]>; Koenig, Christian <[email protected]> Subject: RE: [PATCH] drm/amdgpu: protect eeprom update from GPU reset [AMD Public Use] Hi, Hawking, Driver has multi-path into GPU reset, so driver couldn't guarantee that bad record update has been done before GPU reset. Best Regards Dennis Li -----Original Message----- From: Zhang, Hawking <[email protected]> Sent: Wednesday, October 14, 2020 5:52 PM To: Li, Dennis <[email protected]>; [email protected]; Deucher, Alexander <[email protected]>; Kuehling, Felix <[email protected]>; Koenig, Christian <[email protected]> Cc: Li, Dennis <[email protected]> Subject: RE: [PATCH] drm/amdgpu: protect eeprom update from GPU reset [AMD Public Use] Hmm, I think bad page record update is done ahead of scheduling gpu reset work. For mGPU case, shall we walk through all the nodes in a hive before issue gpu reset work? Regards, Hawking -----Original Message----- From: Dennis Li <[email protected]> Sent: Wednesday, October 14, 2020 17:41 To: [email protected]; Deucher, Alexander <[email protected]>; Kuehling, Felix <[email protected]>; Zhang, Hawking <[email protected]>; Koenig, Christian <[email protected]> Cc: Li, Dennis <[email protected]> Subject: [PATCH] drm/amdgpu: protect eeprom update from GPU reset because i2c is unstable in GPU reset, driver need protect eeprom update from GPU reset, to not miss any bad page record. Signed-off-by: Dennis Li <[email protected]> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c index 0e64c39a2372..695bcfc5c983 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c @@ -149,7 +149,11 @@ static int __update_table_header(struct amdgpu_ras_eeprom_control *control, msg.addr = control->i2c_address; + /* i2c may be unstable in gpu reset */ + down_read(&adev->reset_sem); ret = i2c_transfer(&adev->pm.smu_i2c, &msg, 1); + up_read(&adev->reset_sem); + if (ret < 1) DRM_ERROR("Failed to write EEPROM table header, ret:%d", ret); @@ -557,7 +561,11 @@ int amdgpu_ras_eeprom_process_recods(struct amdgpu_ras_eeprom_control *control, control->next_addr += EEPROM_TABLE_RECORD_SIZE; } + /* i2c may be unstable in gpu reset */ + down_read(&adev->reset_sem); ret = i2c_transfer(&adev->pm.smu_i2c, msgs, num); + up_read(&adev->reset_sem); + if (ret < 1) { DRM_ERROR("Failed to process EEPROM table records, ret:%d", ret); -- 2.17.1 _______________________________________________ amd-gfx mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/amd-gfx
