Sriov should not call amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev,
either amdgpu_amdkfd_post_reset inside amdgpu_device_unlock_adev.
In branch amd-staging-dkms-4.18, sriov already called amdgpu_amdkfd_pre_reset 
and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov.
There 2 functions needed to be inside sriov's amdgpu_virt_request_full_gpu, or 
it would cause sriov hang.
amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev was duplicated for 
sriov, and caused  cause sriov hang when entering amdgpu_device_lock_adev.
That's the reason to add " if (!amdgpu_sriov_vf(adev))" based on branch 
amd-staging-dkms-4.18.

BR,
Wentao

-----Original Message-----
From: Liu, Shaoyun <shaoyun....@amd.com> 
Sent: Tuesday, December 11, 2018 12:10 AM
To: Lou, Wentao <wentao....@amd.com>; amd-gfx@lists.freedesktop.org; 
Grodzovsky, Andrey <andrey.grodzov...@amd.com>; Kuehling, Felix 
<felix.kuehl...@amd.com>
Cc: Lou, Wentao <wentao....@amd.com>
Subject: RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov 
hang

But KFD still need to be notified during reset , the  pre_reset call to KFD 
will let KFD have  a chance to suspend all the  running process queues.  Was 
the reset works normally on SRIOV before the refactor change for  XGMI support 
?  We shouldn't change the logic . 

Regards
shaoyun.liu

-----Original Message-----
From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of wentalou
Sent: Friday, December 7, 2018 1:09 AM
To: amd-gfx@lists.freedesktop.org
Cc: Lou, Wentao <wentao....@amd.com>
Subject: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang

XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside 
req_full_gpu of sriov.
It would make sriov hang during reset.

Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <wentao....@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct 
amdgpu_device *adev)
        mutex_lock(&adev->lock_reset);
        atomic_inc(&adev->gpu_reset_counter);
        adev->in_gpu_reset = 1;
-       /* Block kfd */
-       amdgpu_amdkfd_pre_reset(adev);
+       /* Block kfd: SRIOV would do it separately */
+       if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_pre_reset(adev);
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)  {
-       /*unlock kfd */
-       amdgpu_amdkfd_post_reset(adev);
+       /*unlock kfd: SRIOV would do it separately */
+       if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_post_reset(adev);
        amdgpu_vf_error_trans_all(adev);
        adev->in_gpu_reset = 0;
        mutex_unlock(&adev->lock_reset);
--
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to