[AMD Public Use]
Hi, Hawking,
Driver has multi-path into GPU reset, so driver couldn't guarantee that
bad record update has been done before GPU reset.
Best Regards
Dennis Li
-----Original Message-----
From: Zhang, Hawking <[email protected]>
Sent: Wednesday, October 14, 2020 5:52 PM
To: Li, Dennis <[email protected]>; [email protected]; Deucher,
Alexander <[email protected]>; Kuehling, Felix
<[email protected]>; Koenig, Christian <[email protected]>
Cc: Li, Dennis <[email protected]>
Subject: RE: [PATCH] drm/amdgpu: protect eeprom update from GPU reset
[AMD Public Use]
Hmm, I think bad page record update is done ahead of scheduling gpu reset work.
For mGPU case, shall we walk through all the nodes in a hive before issue gpu
reset work?
Regards,
Hawking
-----Original Message-----
From: Dennis Li <[email protected]>
Sent: Wednesday, October 14, 2020 17:41
To: [email protected]; Deucher, Alexander
<[email protected]>; Kuehling, Felix <[email protected]>; Zhang,
Hawking <[email protected]>; Koenig, Christian <[email protected]>
Cc: Li, Dennis <[email protected]>
Subject: [PATCH] drm/amdgpu: protect eeprom update from GPU reset
because i2c is unstable in GPU reset, driver need protect eeprom update from
GPU reset, to not miss any bad page record.
Signed-off-by: Dennis Li <[email protected]>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 0e64c39a2372..695bcfc5c983 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -149,7 +149,11 @@ static int __update_table_header(struct
amdgpu_ras_eeprom_control *control,
msg.addr = control->i2c_address;
+ /* i2c may be unstable in gpu reset */
+ down_read(&adev->reset_sem);
ret = i2c_transfer(&adev->pm.smu_i2c, &msg, 1);
+ up_read(&adev->reset_sem);
+
if (ret < 1)
DRM_ERROR("Failed to write EEPROM table header, ret:%d", ret);
@@ -557,7 +561,11 @@ int amdgpu_ras_eeprom_process_recods(struct
amdgpu_ras_eeprom_control *control,
control->next_addr += EEPROM_TABLE_RECORD_SIZE;
}
+ /* i2c may be unstable in gpu reset */
+ down_read(&adev->reset_sem);
ret = i2c_transfer(&adev->pm.smu_i2c, msgs, num);
+ up_read(&adev->reset_sem);
+
if (ret < 1) {
DRM_ERROR("Failed to process EEPROM table records, ret:%d",
ret);
--
2.17.1
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx