I agree with disabling debugfs for amdgpu_reset when SRIOV detected. 

-----Original Message-----
From: Christian König [mailto:[email protected]] 
Sent: Monday, May 08, 2017 5:20 PM
To: Liu, Monk <[email protected]>; [email protected]
Subject: Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

> You know that gpu reset under SR-IOV will have very big impact on all other 
> VFs ...
Mhm, good argument. But in this case we need to give at least some warning 
message instead of doing nothing.

Or even better disable creating the amdgpu_reste debugfs file altogether. This 
way nobody will wonder why using it doesn't trigger anything.

Christian.

Am 08.05.2017 um 11:10 schrieb Liu, Monk:
> For SR-IOV use case, we call gpu reset under the case we have no choice ...
>
> So many places like debug fs shouldn't a good reason to trigger gpu 
> reset
>
> You know that gpu reset under SR-IOV will have very big impact on all other 
> VFs ...
>
> BR Monk
>
> -----Original Message-----
> From: Christian König [mailto:[email protected]]
> Sent: Monday, May 08, 2017 5:08 PM
> To: Liu, Monk <[email protected]>; [email protected]
> Subject: Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in 
> gpu-reset
>
> Am 08.05.2017 um 08:51 schrieb Monk Liu:
>> because we don't want to do sriov-gpu-reset under certain cases, so 
>> just split those two funtion and don't invoke sr-iov one from 
>> bare-metal one.
>>
>> Change-Id: I641126c241e2ee2dfd54e6d16c389b159f99cfe0
>> Signed-off-by: Monk Liu <[email protected]>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 3 ++-
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 2 +-
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 3 ++-
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 6 +++++-
>>    5 files changed, 10 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 45a60a6..4985a7e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2652,9 +2652,6 @@ int amdgpu_gpu_reset(struct amdgpu_device *adev)
>>      int resched;
>>      bool need_full_reset;
>>    
>> -    if (amdgpu_sriov_vf(adev))
>> -            return amdgpu_sriov_gpu_reset(adev, true);
>> -
>>      if (!amdgpu_check_soft_reset(adev)) {
>>              DRM_INFO("No hardware hang detected. Did some blocks stall?\n");
>>              return 0;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 5772ef2..d7523d1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -651,7 +651,8 @@ static int amdgpu_debugfs_gpu_reset(struct seq_file *m, 
>> void *data)
>>      struct amdgpu_device *adev = dev->dev_private;
>>    
>>      seq_printf(m, "gpu reset\n");
>> -    amdgpu_gpu_reset(adev);
>> +    if (!amdgpu_sriov_vf(adev))
>> +            amdgpu_gpu_reset(adev);
> Well that is clearly not a good idea. Why do you want to disable the reset 
> here?
>
>>    
>>      return 0;
>>    }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index 67be795..5bcbea0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -221,7 +221,7 @@ void amdgpu_gem_object_close(struct 
>> drm_gem_object *obj,
>>    
>>    static int amdgpu_gem_handle_lockup(struct amdgpu_device *adev, int r)
>>    {
>> -    if (r == -EDEADLK) {
>> +    if (r == -EDEADLK && !amdgpu_sriov_vf(adev)) {
> Not a problem of your patch, but that stuff is outdated and should have been 
> removed completely years ago. Going to take care of that.
>
>>              r = amdgpu_gpu_reset(adev);
>>              if (!r)
>>                      r = -EAGAIN;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index f8a6c95..49c6e6e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -89,7 +89,8 @@ static void amdgpu_irq_reset_work_func(struct work_struct 
>> *work)
>>      struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
>>                                                reset_work);
>>    
>> -    amdgpu_gpu_reset(adev);
>> +    if (!amdgpu_sriov_vf(adev))
>> +            amdgpu_gpu_reset(adev);
> Mhm, that disables the reset on an invalid register access or invalid command 
> stream. Is that really what we want?
>
> Christian.
>
>>    }
>>    
>>    /* Disable *all* interrupts */
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index 690ef3d..c7718af 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -36,7 +36,11 @@ static void amdgpu_job_timedout(struct amd_sched_job 
>> *s_job)
>>                job->base.sched->name,
>>                atomic_read(&job->ring->fence_drv.last_seq),
>>                job->ring->fence_drv.sync_seq);
>> -    amdgpu_gpu_reset(job->adev);
>> +
>> +    if (amdgpu_sriov_vf(job->adev))
>> +            amdgpu_sriov_gpu_reset(job->adev, true);
>> +    else
>> +            amdgpu_gpu_reset(job->adev);
>>    }
>>    
>>    int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>
> _______________________________________________
> amd-gfx mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to