I agree with disabling debugfs for amdgpu_reset when SRIOV detected. -----Original Message----- From: Christian König [mailto:[email protected]] Sent: Monday, May 08, 2017 5:20 PM To: Liu, Monk <[email protected]>; [email protected] Subject: Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset
> You know that gpu reset under SR-IOV will have very big impact on all other > VFs ... Mhm, good argument. But in this case we need to give at least some warning message instead of doing nothing. Or even better disable creating the amdgpu_reste debugfs file altogether. This way nobody will wonder why using it doesn't trigger anything. Christian. Am 08.05.2017 um 11:10 schrieb Liu, Monk: > For SR-IOV use case, we call gpu reset under the case we have no choice ... > > So many places like debug fs shouldn't a good reason to trigger gpu > reset > > You know that gpu reset under SR-IOV will have very big impact on all other > VFs ... > > BR Monk > > -----Original Message----- > From: Christian König [mailto:[email protected]] > Sent: Monday, May 08, 2017 5:08 PM > To: Liu, Monk <[email protected]>; [email protected] > Subject: Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in > gpu-reset > > Am 08.05.2017 um 08:51 schrieb Monk Liu: >> because we don't want to do sriov-gpu-reset under certain cases, so >> just split those two funtion and don't invoke sr-iov one from >> bare-metal one. >> >> Change-Id: I641126c241e2ee2dfd54e6d16c389b159f99cfe0 >> Signed-off-by: Monk Liu <[email protected]> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 3 ++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 3 ++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 +++++- >> 5 files changed, 10 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 45a60a6..4985a7e 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -2652,9 +2652,6 @@ int amdgpu_gpu_reset(struct amdgpu_device *adev) >> int resched; >> bool need_full_reset; >> >> - if (amdgpu_sriov_vf(adev)) >> - return amdgpu_sriov_gpu_reset(adev, true); >> - >> if (!amdgpu_check_soft_reset(adev)) { >> DRM_INFO("No hardware hang detected. Did some blocks stall?\n"); >> return 0; >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> index 5772ef2..d7523d1 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> @@ -651,7 +651,8 @@ static int amdgpu_debugfs_gpu_reset(struct seq_file *m, >> void *data) >> struct amdgpu_device *adev = dev->dev_private; >> >> seq_printf(m, "gpu reset\n"); >> - amdgpu_gpu_reset(adev); >> + if (!amdgpu_sriov_vf(adev)) >> + amdgpu_gpu_reset(adev); > Well that is clearly not a good idea. Why do you want to disable the reset > here? > >> >> return 0; >> } >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >> index 67be795..5bcbea0 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >> @@ -221,7 +221,7 @@ void amdgpu_gem_object_close(struct >> drm_gem_object *obj, >> >> static int amdgpu_gem_handle_lockup(struct amdgpu_device *adev, int r) >> { >> - if (r == -EDEADLK) { >> + if (r == -EDEADLK && !amdgpu_sriov_vf(adev)) { > Not a problem of your patch, but that stuff is outdated and should have been > removed completely years ago. Going to take care of that. > >> r = amdgpu_gpu_reset(adev); >> if (!r) >> r = -EAGAIN; >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c >> index f8a6c95..49c6e6e 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c >> @@ -89,7 +89,8 @@ static void amdgpu_irq_reset_work_func(struct work_struct >> *work) >> struct amdgpu_device *adev = container_of(work, struct amdgpu_device, >> reset_work); >> >> - amdgpu_gpu_reset(adev); >> + if (!amdgpu_sriov_vf(adev)) >> + amdgpu_gpu_reset(adev); > Mhm, that disables the reset on an invalid register access or invalid command > stream. Is that really what we want? > > Christian. > >> } >> >> /* Disable *all* interrupts */ >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >> index 690ef3d..c7718af 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >> @@ -36,7 +36,11 @@ static void amdgpu_job_timedout(struct amd_sched_job >> *s_job) >> job->base.sched->name, >> atomic_read(&job->ring->fence_drv.last_seq), >> job->ring->fence_drv.sync_seq); >> - amdgpu_gpu_reset(job->adev); >> + >> + if (amdgpu_sriov_vf(job->adev)) >> + amdgpu_sriov_gpu_reset(job->adev, true); >> + else >> + amdgpu_gpu_reset(job->adev); >> } >> >> int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, > > _______________________________________________ > amd-gfx mailing list > [email protected] > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/amd-gfx
